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FOREWORD 


The Software Engineering Laboratory (SEL) is an organization sponsored by the National 
Aeronautics and Space Administration/Goddard Space Flight Center (NASA/GSFC) and 
created to investigate the effectiveness of software engineering technologies when applied 
to the development of applications software. The SEL was created in 1976 and has three 
primary organizational members: 

NASA/GSFC, Flight Dynamics Systems Branch 

The University of Maryland, Department of Computer Science 

Computer Sciences Corporation, Development and Systems Engineering organization 

The goals of the SEL are (1) to understand the software development process in the 
GSFC environment; (2) to measure the effects of various methodologies, tools, and 
models on this process; and (3) to identify and then to apply successful development 
practices. The activities, findings, and recommendations of the SEL are recorded in the 
Software Engineering Laboratory Series, a continuing series of reports that includes this 
document. 

Documents from the Software Engineering Laboratory Series can be obtained via the SEL 
homepage at: 

http://fdd.gsfc.nasa.gov/seltext.html 
or by writing to: 

Flight Dynamics Systems Branch 
Code 551 

Goddard Space Flight Center 
Greenbelt, Maryland 20771 
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* 


The views and findings expressed 
herein are those of the authors and 
presenters and do not necessarily 
represent the views, estimates, or 
policies of the SEL. All material 
herein is reprinted as submitted by 
authors and presenters, who are 
solely responsible for compliance 
with any relevant copyright, patent, 
or other proprietary restrictions. 
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Abstract 

NASA’s Software Engineering Laboratory (SEL), one of the earliest pioneers in the 
areas of software process improvement and measurement, has had a significant 
impact on the software business at NASA Goddard. At the heart of the SEL’s 
improvement program is a belief that software products can be improved by 
optimizing the software engineering process used to develop them and a long-term 
improvement strategy that facilitates small incremental improvements that 
accumulate into significant gains. As a result of its efforts, the SEL has incrementally 
reduced development costs by 60%, decreased error rates by 85%, and reduced cycle 
time by 25%. In this paper, we analyze the SEL’s experiences on three major 
improvement initiatives to better understand the cyclic nature of the improvement 
process and to understand why some improvements take much longer than others. 


Background 

Since 1976, the Software Engineering Laboratory (SEL) has been dedicated to understanding and 
improving the way in which one NASA organization, the Flight Dynamics Division (FDD) at 
Goddard Space Flight Center, develops, maintains, and manages complex flight dynamics systems. It 
has done this by developing and refining a continual process improvement approach that allows an 
organization such as the FDD to fine tune its process for its particular domain. Experimental software 
engineering and measurement play a significant role in this approach. 

The SEL is a partnership of NASA Goddard’s FDD, its major software contractor. Computer 
Sciences Corporation (CSC), and the University of Maryland’s (UM) Department of Computer 
Science. The FDD primarily builds software systems that provide ground-based flight dynamics 
support for scientific satellites. They fall into two sets: ground systems and simulators. Ground 
systems are midsize systems that average around 250 thousand source lines of code (KSLOC). 
Ground system development projects typically last approximately 2 years. Most of the systems have 
been built in FORTRAN on mainframes, but recent projects contain subsystems written in C and C++ 
on workstations. The simulators are smaller systems averaging around 60 KSLOC that provide the 
test data for the ground systems. Simulator development lasts between 1 and 1.5 years. Most of the 
simulators have been built in Ada on a VAX computer. The project characteristics of these systems 
are shown in Table 1. The SEL is responsible for the management and continual improvement of the 
software engineering processes used on these FDD projects. 
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Table 1. Characteristics of SEL Projects 


Characteristics 

Applications 

Ground Systems 

Simulators 

System Size 
Project Duration 

Staffing (technical) 

Language 

Hardware 

150 - 400 KSLOC 

1 .5 - 2.5 years 

10-35 staff-years 

FORTRAN, C, C++ 

IBM Mainframes, 
Workstations 

40 - 80 KSLOC 
1 -1.5 years 
1-7 staff-years 
Ada, FORTRAN 
VAX 


The SEL process improvement approach shown in Figure 1 is based on the Quality Improvement 
Paradigm [Reference 1] in which process changes and new technologies are 1) selected based on a 
solid understanding of organization characteristics, needs, and business goals; 2) piloted and assessed 
using the scientific method to identify those that add value; and 3) packaged for broader use 
throughout the organization. Using this approach, the SEL has successfully established and matured 
its process improvement program throughout the organization. 


PACKAGING 


ITERATE^" 

ASSESSIN^ 

Make improvements part of your business 

• Update standards 

• Refine training 

UNDERSTAND^ 

Determine effective improvements 

• Determine improvements and set goals 

• Measure changed process and product 

• Analyze impact of process change on product 

Know your software business 


• What are my software characteristics? 

• What process do we use? 

• What are our goals? 

TIME 


► 


Figure 1. SEL Process Improvement Paradigm 


The SEL organization consists of three functional areas: software developers, software engineering 
process analysts, and data base support (Figure 2). The largest part of the SEL is the 150 to 200 
software personnel who are responsible for the development and maintenance of over 4 million 
source lines of code (SLOC) that provide orbit and attitude ground support for all Goddard missions. 
Since the SEL was founded, software project personnel have provided software measurement data on 
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over 130 projects. This data has been collected by data base support personnel and stored in the SEL 
data base for use by software project personnel and process analysts. The process analysts are 
responsible for defining the experiments and studies, analyzing the data, and producing reports. These 
reports affect such things as project standards, development procedures, and how projects are 
managed. The data base support staff is responsible for entering measurement data into the SEL data 
base, quality assuring the data, and maintaining the data base and its reports. 


DEVELOPERS 


PROCESS ANALYSTS 


Staff level: 150 - 200 

Measures 

Staff level: 10 -15 

Function: Develop 
software 


Function: Design studies 
Perform analysis 
Refine process 

Refined 

Process 

NASA & CSC 


NASA & CSC &UM 



DATA BASE SUPPORT 



Staff level: 2 - 3 
Function: Process, QA, 
& archive data 

NASA & CSC 


SEL 

Data Base 


130 Projects 


Reports *SEL Reports 

Library hJ] | •Project Docs 


Figure 2. SEL Organizational Structure 

Improvement Cycles 

Although the improvement process is a never-ending endeavor, it is cyclic in nature. At the SEL, 
improvement cycles operate within the context of the SEL process improvement paradigm. Each 
improvement cycle tends to focus on a single organizational goal and only one or two process or 
technology changes that address that goal. Often these build on earlier experimental results. Each 
SEL improvement cycle has four major steps: 

1. Each improvement cycle begins with setting improvement goals based on the current business 
needs and strategic direction of the organization. Based on a solid understanding of the problem 
domain (application), the development environment, and the current process and product 
characteristics of the organization, process analysts identify leverage areas, i.e., software process 
or product characteristics that could have a significant impact on the overall performance of the 
organization. For example, increasing software reuse would have a high probability of reducing 
project cost and development cycle time. Therefore, if the business goals are to reduce cost and/or 
cycle time, increasing reuse would be a reasonable leverage area. 

2. The next step is to identify software engineering technologies (processes, methods, and/or tools) 
that are likely to affect the leverage area. For example, object-oriented techniques (OOT) are 
reported to facilitate reuse. The ultimate goal of this step is to select one technology or process 
change that has the greatest potential for meeting the improvement goal. 

3. The third and longest step of the improvement cycle is to conduct experiments to understand the 
value and applicability of the new technology in the local organization. Scientific methods are 
used to pilot the technology on one or more real projects and observe the use and effect of the 
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technology on the development process, products, and project performance. Process analysts use 
both qualitative feedback and quantitative measurements to evaluate the value of the 
technology/process change. Key project measurements are compared with those from a control 
group (similar contemporary projects using the standard process) to assess overall value. Several 
experiments that successively refine the process/technology may be required before it is ready to 
deploy. 

4. The final step in an improvement cycle is to deploy the beneficial process/technology throughout 
the organization. This involves integrating the process change/technology into the standard 
process guidebooks, providing training to project personnel, and providing ongoing process 
consulting support to facilitate the adoption of the new technology/process change. 

Since its inception, the SEL has completed numerous improvement cycles spanning from 1 to 7 years. 
The amount of time it takes to complete a cycle depends on the maturity and breadth of the 
technology/process change. Several improvement cycles are usually active at one time; however, they 
involve different subsets of the organization’s projects. 

'SEL Improvement Examples 

In 1985, the SEL set two fairly common improvement goals: 1) reduce the cost of developing 
software systems and 2) improve the quality of delivered systems. In 1990, in response to NASA’s 
new emphasis on launching missions more quickly, a third goal was adopted: 3) reduce the cycle time 
needed to develop new systems. All of these goals were addressed by leveraging different process and 
technology areas within the context of a unified improvement program. 

The following examples illustrate the different approaches taken and results achieved within three 
representative SEL improvement initiatives. As shown in Table 2, each initiative used a different 
number of improvement cycles and a somewhat different deployment strategy to achieve the desired 
results. The number of improvement cycles was driven by the experiment approach and results, while 
the deployment strategy was selected based on a risk/benefit analysis of the process change using the 
experiment results. 


Table 2. SEL Improvement Examples 


Goal 

Improvement 

Initiative 

Cycles 

Experimentation 

Approach 

Deployment 

Approach 

Reduce 

Cost 

Maximize 

Reuse 

2 

Iterative learning of how to 
apply OO concepts; develop 
new reuse methods 

Full use in highest 
payback applications 
(subset of projects) 

Increase 

Quality 

Leverage 

Human 

Discipline 

3 

Iterative refinement of 
existing, external testing 
techniques and Cleanroom 
Methodology 

Subset of ‘best’ 
techniques across all 
projects 

Reduce 

Cycle 

Time/Cost 

Streamline 

Testing 

Process 

1 

Refine and consolidate local 
(familiar) processes 

Full use across all 
projects 
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Example 1: Maximizing Reuse 


To reduce costs, the SEL chose to introduce and experiment with two software engineering 
technologies, the Ada language and object-oriented design (OOD), that had high potential for 
maximizing software reuse. Experimentation began across a single class of applications, flight 
dynamics simulators, as the first improvement cycle focused on defining a generalized architecture 
based on more theoretical OO concepts. Once the developers were able to apply the architecture to 
their systems, the application scope expanded to include generalizing more elements of flight 
dynamics systems. The second group of experiments expanded the definition of ‘generalized’ to 
include reusable specifications, which has resulted in a large library of reusable flight dynamics 
components. Figure 3 shows the experimental focus areas and timeline for these two improvement 
cycles. Because the early work with OO was more conceptual, several phases of experimentation 
across different development projects were undertaken prior to deploying the supporting process 
changes. 



Generalized 

Library 

Reuse Library 
Components 

Reusable 

Specification; 

%• : 

> 



Development 

Concepts 






Gene 

Archil 

iralized 

lectures 

Reuse of 
Architectures 

OO Concepts 



1985 1990 1995 

[HI Experimentation Q Deployment 

Figure 3. SEL Reuse Improvement Cycle Timeline 


Within 4 years, this effort culminated in the first deployment of reusable generalized architectures 
that have led to a 300% increase in software reuse per system and an overall cost reduction of 55% 
during the next 4 years [Reference 2]. Further development of these object-oriented concepts has 
produced a set of reusable specifications and a corresponding component library that promises even 
greater improvements in 1997 systems. Figure 4 depicts the measured impact to the FDD resulting 
from these changes. 
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Percent Reuse 


Total Cost per Mission 




Figure 4. Results of Introducing OOD and Ada 


Example 2: Leveraging Human Discipline 

Early experimental results showed the positive impact on software development from leveraging the 
experience and perspective of the individual developer. Based on these results, the SEL chose to 
focus on software engineering methodologies that support human discipline to meet our quality goal 
[Reference 3]. The first improvement cycle, which investigated different testing techniques such as 
code reading and unit and functional testing, confirmed that those methods which relied on human 
discipline were the most effective. This led to a significant effort within the SEL to maximize the 
potential of human discipline by experimenting with the Cleanroom Methodology [Reference 4]. 

The SEL has completed two improvement cycles over four projects (two large, two small) that 
specifically addressed Cleanroom; the initial SEL Cleanroom project began in 1988, with the fourth 
and final effort completed this year. The focus of the Cleanroom Methodology is on producing 
software with high probability of zero defects. The key elements of the methodology include an 
emphasis on human discipline in the development process via code inspections and requirements 
classification, and a statistical testing approach based on anticipated operational usage. Development 
and testing teams operate independently, and all development team activities are performed without 
on-line testing. Analysis of the first three Cleanroom efforts had indicated greater success in applying 
the methodology to smaller (< 50K developed lines of code (DLOC)) in-house Goddard projects than 
to larger scale efforts typically staffed by joint contractor-government teams. The final Cleanroom 
project involved the development of a large-scale system (480K SLOC, 140K DLOC). The primary 
study goal was to examine it as an additional data point in the SEL’s analysis of Cleanroom 
applicability in the organization, especially in the area of scalability. 

The goal of the SEL’s Cleanroom study was not to make a decision on adopting Cleanroom in its 
entirety within the organization, but rather to highlight those aspects of the methodology that had 
favorable impacts and to incorporate them into the standard approach. This approach of incremental 
deployment, shown in Figure 5, proved very successful in instilling these changes throughout the 
organization. Experimentation with Cleanroom raised the general awareness of the organization 
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regarding quality techniques and discipline-enhancing processes. This emphasis is one of the key 
reasons for the FDD’s steady improvement in reducing development error rates by 85% over a 15- 
year period, as shown in Figure 6. 


^^CleanrbGp 

Usage 

Large Projects 

Testing 



i Cleanrootn f 
SmallPrqjects 

Code Inspection & 
Reqs. Classification 


Code Reading & 
Functional UT 



1984 1990 1996*" 


HI Experimentation [ | Deployment 

Figure 5. SEL Quality Improvement Cycle Timeline 


Development Error Rates (1976 -1995) 



Figure 6. Quality Improvement in the SEL 


Example 3: Streamlining the Testing Process 

In 1992, the SEL saw the cost of system development decreasing significantly due to increasing code 
reuse; however, no corresponding decrease in development cycle time was occurring. In addition, 
although the cost associated with design and code effort had been reduced, testing costs remained 
virtually the same. This led to an assessment of the testing processes in use and resulted in a decision 
to focus testing in one group. This group, called the independent testers, effectively collapsed two 
separate testing phases (system and acceptance) conducted by two different groups (developers and 
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users) into one phase (independent) performed by one group composed of experienced flight 
dynamics analysts and testers. This change, in both process and organization, was introduced in order 
to reduce the cycle time required to deliver a system, to improve the efficiency of the testing process, 
and to do so without sacrificing the quality of any product delivered. 

Since this change was limited to one organization that was already heavily involved in defining the 
new testing process, the experimentation portion was brief and the risk of full deployment was judged 
to be low (Figure 7). Once the organizational changes were made, process changes were implemented 
quickly, simultaneously across all current test efforts. The resultant measurements (Figure 8) indicate 
that independent testing has yielded a definite shift in life-cycle effort distribution, with the testing 
effort being reduced from 41% to 31% of the total project effort [Reference 5]. Reductions in cycle 
time on the order of 5% to 20% have been verified with no loss of quality. 


Form independent 
testing group 


& define new - — -*» 

■ : v :: ; 

New Independent 

test process 

in V : ; 

Test Team Approach 


1992 1994 1996*^ 


| | Experimentation Q Deployment 


Figure 7. SEL Independent Testing Improvement Cycle Timeline 
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Figure 8. Results of Streamlining the System Testing Process 
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Measuring Overall Improvement 


Each of the above initiatives resulted in measurable improvement; however, each was measured in 
isolation on a particular set of projects. On a long-term, continual improvement program, it is 
important to periodically assess how the organization is doing as a whole. To make this assessment 
and to update the organizational characteristics that will drive future improvement decisions, the SEL 
periodically computes an organizational baseline. This consists of key measurements that characterize 
the performance of the project organization over a specified time period and represent the 
organization’s ability to perform similar work in the future. 

We use a fairly small set of baseline measurements to evaluate improvement. They include total cost, 
total duration, development error rate, reuse percentage, cost per new line of code, and cost per 
delivered line of code. For each baseline measurement, a maximum, a minimum, and a project mean 
are computed for a particular time period, referred to as the baseline period. Overall improvement in 
each measurement is determined by comparing the means of two baseline periods, i.e., (current mean 
- previous mean) / previous mean. 

Since 1985, the SEL has computed three baselines to measure overall improvement. Figure 9 shows 
when these baselines were computed in relation to the three examples discussed earlier. Notice that 
baselines were computed a few years after a set of improvements were deployed, allowing time for 
projects to use the improved process. Figures 10 and 11 show how the results of the individual 
initiatives combined to make significant overall improvements. 


Reuse & 
Ada/OO 


Unit Testing 
& Cleanroom 
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Test Teams 
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Figure 9. Improvement Cycle Timelines 


SEW Proceedings 


11 


SEL-96-002 



The SEL’s recently completed 1996 organizational baseline shows across-the-board improvement in 

all measurements: 

• Average mission cost decreased by 15% when compared with the 1993 baseline, totaling a 60% 
overall reduction in mission cost since 1985 (Figure 10). 

• The cost to develop a line of new code decreased nearly 35% since 1993. (There had been no 
previous improvement in this measure.) 

• Ground system projects saw a modest 7% reduction in project cycle time, while simulators 
experienced a 20% reduction since 1993 (Figure 8). 

• Error rates continued to drop, with a 40% reduction in development error rates since 1993. This 
combines with earlier improvements to total an 85% drop in development error rates over the past 
10 years (Figure 6). 

• After the initial 300% increase in reuse seen in the 1993 baseline, software reuse remained high, 
with an average of 80% on all projects; however, the minimum amount of reuse has now risen 
from 18% in the 1993 baseline project set to 62% in the recent project set, demonstrating a much 
more consistent use of reusable products in the SEL (Figure 11). 


60% Overall Cost Reduction 



Figure 10. Overall Cost Reduction in SEL 
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340% Total Increase In Reuse 



Figure 11. Overall Improvement in Reuse 

Observations and Conclusions 

The SEL’s success with incremental process change, as opposed to leading-edge technology adoption, 
has led us to select the experimental approach to changing process gradually. Experimentation has 
allowed the beneficial changes to be deployed incrementally with low risk to ongoing projects. 
Deployment has been quicker for those process changes that were confined to a single phase or 
development activity, as with the test team process change. Following are several observations and 
recommendations based on our analysis of the improvement cycles discussed in this paper. 

• Focus on a single goal for each process/technology change to provide a clear definition of the 
expected change and non-ambiguous measurement of its effect. There is a temptation to overload a 
single project with multiple changes, often in the hope that at least one will work. SEL experience 
suggests that this approach will not result in sustained improvement; it will only confuse the team and 
obscure the impact of the individual technologies. 

• Select process changes that leverage peoples’ talents. Processes that enhance human discipline and 
intellectual abilities provide significant improvement. Tools should be used to replace or facilitate 
routine tasks such as configuration and change management. 

• Allocate sufficient experimental time for tailoring and iterative application/leaming of new 
concepts. The SEL’s experience in first developing OOD concepts followed by a generalized 
architecture, prior to deployment, shows the benefit of taking a little more time to develop a more 
usable product (the architecture) rather than deploying the more abstract concepts first. 

• Set improvement time expectations appropriately. The more familiar the organization is with the 
process being changed, the faster it can be tuned and deployed and its impact realized. Existing 
processes can be refined and adapted more rapidly than abstract concepts; however, adapting an 
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external (unfamiliar) process, such as Cleanroom, will take longer than refining an existing local 
process, such as streamlining the SEL testing process. 

• Deploy a subset of the changes as soon as the benefit is shown. Often it is clear that certain 
subprocesses or techniques are very beneficial even though the entire new process/technology may 
not yet be proven. Early deployment allows the organization to reap its benefits as early as possible 
and paves the way for the rest of the method that may follow. 
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Presentation Outline 


■ What is an improvement cycle? 

♦ Relationship to SEL Improvement Approach 

♦ Improvement cycle steps 

■ Compare/contrast SEL examples 

♦ Reuse 

♦ Quality Techniques 

♦ Independent Testing 
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What Is an Improvement Cycle? 

Iterations of experimentation followed by | 
deployment to satisfy an organizational goal I 


Package 


Assess 


Understand 


SEL Improvement Paradigm 



Step 1 - Use Understanding of 
Process and Environment 



■ What’s inside/outside organization’s control 
(requirements changes, deadlines) 

■ Current baseline measures of organizational 
performance (effort, schedule, errors) 

■ Process characteristics (work activities) 

■ How people spend their time 
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Step 2 - Select Process Change 
Based on Organizational Goals 



Goal 

Leverage Area 

Experimental Focus 

Decrease 

Cost 

Maximize reuse 

Ada, object oriented 
techniques 

Minimize rework 

CASE 

Eliminate process 
redundancy 

Combine phases 

Increase 

Quality 

Increase personal 
discipline 

Cleanroom, personal 
software process 

Detect errors earlier 

Testing and review 
methods 


ESSS 


Step 3 - Follow Experimental 
Appr oach 

' Il l , 



■ Select measures to fulfill experimental goals 

■ Iterate on multiple projects, using multiple 
techniques 

♦ Established methods: Pilot/Refine 

♦ Conceptual methods: Create/Pilot/Refine 

■ Involve development organization in feedback 
loop 
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Step 4 - Deploy Throughout 
Organization 



■ Document process to appropriate level 

■ Provide training for new element in the 
context of the existing process 

■ Reinforce use by publicizing results to 
development organization 


Example 1 - Reuse 



Improvement Goal 

• Reduce cost 

lH 

Baseline Measures 

. 20% code reuse per system 
. 564 staffmonths per mission 


Leverage Area 

. Increase software reuse 

ml 

Process or 
Technology 

• Use Ada language 
. Apply object-oriented concepts 

3 : 

Expectations 

. 40% code reuse per system 
. Reduced cost per mission 

XvXvtvXv! 

Experiment 

Approach 

. Iterative learning 
• Multiple small projects 

2 

Deployment j 

• Full use in highest payback applications 
. Just-in-time training by local experts 
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Reuse Improvement Cycles 

2 major improvement cycles 

Iterative learning of how to apply 00 concepts 

Scope: Increased from code to specifications reuse 


Reuse Focus 

Specifications, 
Design & Code 


Design & Code 




100 % 

90 % 

80 % 

aF0% 

w 

|o% 

So% 

u 

§ 0 % 
30 % 
20 % 
10 % 
, 0 % 


Reuse - Results of 
First Improvement Cycle 


Improvement exceeded expectations. | 


Percent Reuse 


fcvMvX-X-X 


Max = 35% 

Avg = 18% 
Min = 11% 


rXv.v.v. >x1 




Max = 97% 


Avg = 73% 


Min = 18% 


1985 - 1989 1990 - 1992 

300% Increase in Reuse 


Total Cost per Mission 
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Example 2 - Quality Techniques 




ll 

Improvement Goal 

. Increase quality 

Baseline Measures 

. 6.5 errors per KSLOC 

Ill 

Leverage Area 

• Human discipline 

Process or 
Technology 

. Various testing and review techniques 
. Cleanroom Methodology 

IS 

Expectations 

. Fewer errors during development and use 
. No additional cost 

Experiment 

Approach 

. Iterative refinement 

. Controlled experiments; sequential projects 

4 

Deployment 

. Broad use of most beneficial subset of 
techniques 

. Subset included in standard process 


Quality Improvement Cycles 


3 improvement cycles 

Iterative refinement of existing technologies 

Scope: Small to larger projects; unit to full system testing 
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Quality Techniques - Results 



Intermediate deployment drove steady decrease in error rates. 
85% improvement over 15 years. 


Example 3 - Independent Testing 


■ 

Improvement Goal 

• Reduce cost and schedule 

111 

Baseline ; 

Measures 

• 254 staffmonths per mission 

• 106 weeks per mission 

111 

Leverage Area 

• Eliminate process redundancy 

Process or 
Technology 

• Form independent test teams from system 
and acceptance test groups 

* Overlap testing and development of builds 

IS 

Expectations 

• Reduced cost and schedule per mission 
■ No loss of quality 

Experiment 

Approach 

* Define process 

* Reorganize and pilot 

sttosuai 

Deployment 

• Full use on all applications 

• Test teams fine-tune process 
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1 improvement cycle 

Refinement of existing process and organization change 
Scope: Piloted on all projects immediately 


Form independent 
testing group 
& define new 
test process 



1992 1994 

[li| Experimentation j|| Deployment 


1996 


Independent Test Teams - Results 








Overall Improvement 










Keys to Success 


■ Focus on one primary organizational goal 

■ Select process changes that leverage people 
(use technology to replace routine tasks) 

■ Allocate more time (iterations) when creating 
process from concepts 

■ Actively seek developer feedback 


Conclusions 


■ More localized process changes lead to more 
rapid rate of improvement .... 

. . . but, broader conceptual changes result in 
larger improvements. 

■ Experimentation allows for intermediate 
deployment of new process or technology 
with minimal risk 
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Evolving the Reuse Process at the Flight Dynamics Division 

(FDD) Goddard Space Flight Center > > ^ y 


S. Condon,' C. Seaman , 1 2 V. Basili , 2 S. Kraft , 3 J. Kontio , 2 Y. Kim 2 


* / 


Abstract 

This paper presents the interim results from the 
Software Engineering Laboratory's (SEL) Reuse 
Study. The team conducting this study has, over 
the past few months, been studying the 
Generalized Support Software (GSS) domain asset 
library and architecture, and the various processes 
associated with it. In particular, we have 
characterized the process used to configure GSS- 
based attitude ground support systems (AGSS) to 
support satellite missions at NASA’s Goddard 
Space Flight Center. To do this, we built detailed 
models of the tasks involved, the people who 
perform these tasks, and the interdependencies and 
information flows among these people. These 
models were based on information gleaned from 
numerous interviews with people involved in this 
process at various levels. We also analyzed effort 
data in order to determine the cost savings in 
moving from actual development of AGSSs to 
support each mission (which was necessary before 
GSS was available) to configuring AGSS software 
from the domain asset library. 


reuse through a series of studies, experiments, 
pilot projects, and full-fledged development 
projects at the Flight Dynamics Division (FDD) of 
NASA’s Goddard Space Flight Center (GSFC). 
The SEL adopted Ada83 for these experiments 
and projects at a time when C++ was still 
relatively unknown. From this Ada work, the SEL 
determined that object-oriented (O-O) technology 
was providing the best reuse benefits within the 


3(q6V}T£) 

p 8cX. 


FDD. 


Around 1989-90 the Ada/0-0 experience merged 
with an FDD-wide initiative to develop a 
’’configurable” flight dynamics attitude support 
system. The result evolved into the Generalized 
Support Software (GSS) Domain Engineering 
Process. By means of this process, the FDD has 
shifted from developing applications to 
configuring applications out of generalized, 
reusable assets. The term "assets” encompasses 
design specifications, code components, tools, and 
standards. To date, eight applications, supporting 
two NASA satellite missions, have been 
configured from the GSS asset library and 
delivered to acceptance testing. 


While characterizing the GSS process, we became 
aware of several interesting factors which affect 
the successful continued use of GSS. Many of 
these issues fall under the subject of evolving 
technologies, which were not available at the 
inception of GSS, but are now. Some of these 
technologies could be incorporated into the GSS 
process, thus making the whole asset library more 
usable. Other technologies are being considered 
as an alternative to the GSS process altogether. In 
this paper, we outline some of issues we will be 
considering in our continued study of GSS and the 
impact of evolving technologies. 


1. Introduction 

Since 1985 the Software Engineering Laboratory 
(SEL) has been evolving methods of software 


A SEL Reuse Study team was tasked to analyze 
the GSS process, determine the cost and quality of 
the resulting systems, document and evaluate its 
strengths and weaknesses, and propose 
modifications to it. This paper presents the 
preliminary results of this SEL study. 

The paper examines several relevant cost issues. 

It compares the cost of investment in the GSS 
asset library to the investment in previous FDD 
reuse libraries. It compares the deployment costs 
(design, configuration and testing) of GSS-based 
applications to the development costs of previous 
FDD applications and contrasts the resulting cost 
savings with the investment cost in the GSS asset 
library. The paper also demonstrates that the GSS 
process has resulted in a significant decrease in the 
time required to field a new application. 


1 Computer Sciences Corporation, Lanham-Seabrook, Maryland 

2 Computer Sciences Dept., University of Maryland, College Park, Maryland 

3 Goddard Space Flight Center, Greenbelt, Maryland 
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In addition to analyzing software metrics such as 
effort and cycle time, the reuse study team 
interviewed numerous domain analysts, mission 
analysts, component engineers, application 
configurers, and application testers who have been 
involved in the GSS process. The study team 
adopted Yu’s Actor-Dependency (AD) formalism 
to model the dependence of various GSS process 
actors on other actors and resources. In order to 
further understand more complex actors in this 
process, the team applied Yu’s Agent-Role- 
Position (ARP) formalism to make explicit the 
many different roles one actor may play in the 
process. (Reference 1) 

2. History of FDD Reuse 

2.1 Environment of the FDD & SEL 

Over the past decade, the FDD of GSFC has 
usually consisted of about 100 civil servants 
supported by 300-400 CSC and subcontractor 
personnel. (In the last two years, NASA-wide 
reductions in the workforce have reduced these 
numbers somewhat.) Of these personnel, about 
40% are software developers or testers. Another 
40% are operations personnel or FDD analysts. 
The analysts are the experts in orbital mechanics, 
mathematics, or other technical disciplines who 
write the software requirements for FDD 
applications. 

The mission of the FDD is to build, deploy, and 
maintain space ground systems for NASA science 
missions, with emphasis on earth orbiting 
satellites. Flight dynamics applications are 
essentially scientific data processing systems: 
some are institutional (i.e., they support multiple 
missions) and others are mission-specific (i.e., a 
new one needs to be built for each new 
spacecraft). Each FDD application supports some 
aspect of spacecraft flight dynamics via one of 
three domains: (1) attitude determination, 4 (2) 
mission and maneuver planning, or (3) orbit and 
navigation. This paper focuses on the evolution of 
software reuse within the attitude determination 
domain of the FDD. 

The SEL is a virtual organization which consists 
of civil servants from the software development 
group of the FDD, CSC contractors supporting 


4 ’’Attitude” means the spatial orientation of a 
spacecraft 


them, and representatives from the Computer 
Science Department of the University of Maryland 
at College Park. The SEL has been in existence 
for over 20 years, during which time it has guided, 
studied, documented, and nurtured software 
experimentation within the FDD. (Reference 2) 

2.2 History of SAW Reuse at the FDD & 
SEL Prior to GSS 

During the last dozen years, the SEL and the FDD 
have focused in particular on how to increase 
software reuse levels, with the expectation that this 
would reduce cost and cycle time. At the 
beginning of this experimentation, the FDD was 
developing software applications in a FORTRAN 
mainframe environment, achieving a modest level 
of reuse of very low level utilities. Through a 
series of studies, experiments, pilot projects, and 
full-fledged development projects, the SEL and 
FDD began evolving methods of software reuse. 
Efforts were focused in the attitude determination 
domain, whose class of mission-specific 
applications would benefit most from increases in 
software reuse. 

The SEL learned a great deal about using 0-0 and 
Ada generics for one particular type of 
application, a simulation test tool whose 
development was transferred from the IBM 
mainframe to an Ada-friendly platform, the DEC 
VAX. From these experiments and mission 
projects, the SEL determined that the use of 
object-oriented principles, rather than the Ada 
language itself, was providing the primary reuse 
benefits within the FDD. (Reference 3) 

The bulk of the FDD’s mission-specific 
applications, the AGSSs, however, continued to be 
developed in FORTRAN on the IBM mainframe. 
The SEL was unable to transfer its Ada practices 
to the mainframe because adequate Ada tools for 
the mainframe environment were lacking. In lieu 
of this, the FDD applied some domain engineering 
concepts to create two FORTRAN reuse libraries 
for developing AGSSs. One library was 
developed to support AGSSs for non-spinning 
satellites, and the other for spinning satellites. 

The majority of satellites supported by the FDD, 
traditionally, are non-spinning. The FDD had 
some success with the FORTRAN reuse libraries, 
but the results were not truly “generalized” and the 
libraries grew with each new mission and became 
cumbersome to maintain. Nonetheless, these were 
all valuable experiences on which the FDD was 
able to build. 
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2.3 Motivation , Goals and Definition of 
GSS 

Concurrent with the SEL-sponsored experiments 
in O-O, was a division- wide FDD initiative to 
examine the possibility of generalizing all flight 
dynamics software so that in future all applications 
would be configured rather than developed . The 
members of this team wrestled with what it means 
to ’’configure" an application, as opposed to 
"develop” an application, and came to the 
conclusion that it was only possible if an FDD 
reuse library were built around objects. This 
decision made the 0-0 experiments all the more 
important. Around 1989-90 the Ada/O-O 
experience and the search for "configurable" flight 
dynamics software applications merged and 
evolved into what was to become the Generalized 
Support Software ( GSS) Domain Engineering 
Process. 


The project organization consists of FDD mission 
analysts, application developers, and application 
testers. The mission analysts are the FDD 
personnel whose training and experience in orbital 
mechanics and mathematics qualifies them to 
write the requirements for FDD applications. As 
the project organization goes about its business of 
developing applications, the experience factory 
collects metrics and lessons learned from them. 
The experience factory staff stores these data in a 
database, analyzes the data, suggests and conducts 
additional experiments, and finally packages these 
distilled project organization experiences into 
recommended best practices, estimation models, 
and software development training courses, which 
spread these process improvements throughout the 
FDD project organization. Figure 1 depicts this 
traditional relationship between the project 
organization and the experience factory. A heavy 


The GSS process relies upon the GSS Asset 
Library , a library of generalized, configurable 
application components developed by the FDD 
with an object-oriented domain engineering 
approach. GSS specifications adhere to a 
standardized approach for specifying object- 
oriented classes. This standardization allows the 
use of standard rules for the implementation of 
each class, including a generic detailed design for 
each class and a system architecture that allows 
classes to be configured into a program that 
communicates with the FDD's User Interface and 
Executive (UIX). By means of the GSS process, 
the FDD has shifted from developing applications 
to configuring applications out of generalized, 
reusable assets. The term "assets" encompasses 
design specifications, code components, tools, and 
standards. 

In 1992 the design of the GSS asset libraiy got 
into full swing, followed in early 1993 by coding 
of the assets, which were implemented in the 
Ada83 language and resided on a DEC Alpha 
workstation. In February 1995 work began in 
earnest on configuring the first application from 
this asset library. To date, eight applications, 
supporting two NASA satellite missions, have 
been configured from the GSS asset library and 
delivered to acceptance testing. These 
applications run on HP or Sun workstations. 

2 A GSS as an Experience Factory 

In order to carry out process improvements within 
the FDD, the SEL functions as an experience 
factory in relation to the project organization. 



Figure 1: Traditional SEL Experience Factory 


dashed line separates the two groups. The light 
dotted line separating the mission analysts from 
the software developers on the project 
organization side reflects the fact that traditionally 
the SEL has not collected metrics from mission 
analysts in the FDD. 

With the development of the GSS Asset Library, 
the boundaries and scope of the experience factory 
appear to have expanded. New personnel, 
formerly part of the project organization, are now 
fulfilling experience-factory-type roles. Instead of 
supplying only process improvements to the FDD 
project organization, however, these people are 
also supplying product improvements to the FDD 
in the form of generalized library assets. 
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Figure 2 , . GSS Component Development and 
Application Deployment Process 


Figure 2 depicts this new dimension to the 
experience factory concept at the FDD. A few 
former mission analysts have become domain 
analysts. They have designed the GSS 
architecture and written the GSS functional 
specifications for the library assets. At the same 
time several applications developers have become 
component engineers and have coded the classes 
and categories defined by the GSS functional 
specs. With these assets developed, the project 
organization then follows a streamlined process 
for application deployment. Under the new 
deployment process, a mission analyst must write 
the GSS mission specification that stipulates 
which GSS classes & categories are required for 
the application, which of the many parameters 
associated with these assets are necessary for this 
application, and what values need to be assigned 
to these parameters. This mission specification is 
passed to an application configurer — application 
developers are no longer needed — and the 
configurer then instantiates the specified objects 
from the generalized classes in the asset library 
and links them to form the desired application. 
The application testers then test the application 
and turn it over to operations. 

3. Characterization of the 
GSS Application Deployment 
Process 

A SEL Reuse Study team was tasked to analyze 
the GSS configuration process, determine the cost 


and quality of the resulting application systems, 
document and evaluate the strengths and 
weaknesses of the process, and propose 
improvements to it. In this section, we describe 
the preliminary results of this study of the GSS 
configuration, or application deployment, 
process, which is used to define, configure, and 
test an attitude support software application. 
Below, we describe the methods we used to 
gather and analyze this process information. In 
the sections which follow, we first characterize 
the configuration process quantitatively with 
respect to its cost, schedule, and the errors in the 
resulting applications. We then present the 
process graphically and analyze its inner 
workings. 

To model the GSS configuration process, the 
team began by studying documentation and 
holding informal discussions with managers, task 
leaders, and a few key technical personnel. At 
the same time we began to analyze SEL data on 
effort, estimates, schedules, and software changes 
related to the GSS asset library and to the software 
applications that were configured from it. As this 
metrics data analysis was proceeding, we 
conducted numerous detailed, structured 
interviews with people playing a variety of roles 
related to GSS in order to obtain information of 
sufficient detail to model the configuration 
process. 

3.1 Analysis of Metrics Data 

3.1.1 GSS Costs 

There are two relevant costs to consider when 
evaluating the GSS project. One is the cost 
associated with configuring applications from GSS 
components. Figure 3 compares the cost of 
deploying GSS-based applications to costs in the 
previous two eras, and demonstrates that GSS- 
based applications can be deployed for as low as 
10% of the cost required during the 
FORTRAN/Ada reuse era. 

Prior to 1985 it cost 58,000 hours to develop and 
test the attitude support applications for a typical 
FDD mission. Later, when the FDD was using 
Ada reuse libraries to develop simulators and 
FORTRAN reuse libraries to develop AGSSs, this 
cost dropped to 30,000 hours per mission. In both 
eras the development of the non-real-time system 
and the utilities required the most effort. 
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Fre-1985 FORTRAN 1st GSS 2nd GSS 
Reuse /Ada Mission Mission 
Reuse (no RT 

Era 


system) 


□Non-Real- 
Time System 
& U^es 
©Real-Time 
System 

©Simulator 


categories. We know the effort 
required to develop and test the 
FORTRAN and Ada reuse libraries, 
but we do not know the hours spent 
on requirements, since traditionally 
the SEL does not collect metrics from 
FDD mission analysts. Even so, we 
can see that the GSS library was 
developed for less than the combined 
cost of developing the FORTRAN 
and Ada reuse libraries, which it 
replaced. 


Figures 3 and 4 further demonstrate 
that if the FDD continues to deploy 
GSS- based applications for 10% of 
the cost of the preceding era, the 
FDD will recoup its entire library 
When it came time to support the first mission investment cost of 76,000 hours by 

with the GSS library, the simulator was 
configured first, and the real-time portion of 
the AGSS was configured second. In each 
case, the GSS asset library was still 
undergoing redesign and growth. The 
configurers were also evolving the 
configuration process. Consequently, the 
cost of deploying these first two applications 
was more than it had been in the 
FORTRAN/Ada reuse era. When the time 
came to configure the non-real-time portion 
of the AGSS and the utilities, the asset 
library and configuration process had 
stabilized. As a result, this cost only a 
fraction of the typical cost from the previous 
era. With the second GSS-supported 
mission, we see even more dramatic savings. 

The simulator and the non-real-time system 
plus utilities each cost on the order of 10% 
of their cost from the FORTRAN/Ada reuse 
era. No real-time system was required for 
this application. 

The other important cost to remember is the 
initial cost of building the GSS library itself. 

These costs are shown in Figure 4 alongside 
the costs to develop and test the FORTRAN 
and Ada reuse libraries from the previous 
era. For the GSS asset library we know that 
the domain analysts spent 36,000 hours 
defining the requirements and the logical 
design in the GSS functional specifications. 

The component engineers spent 40,000 hours 
creating the physical design and 
implementing, inspecting, and unit testing 
the generalized Ada83 classes and 
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Figure 4: Library Investment Costs in Two Eras 


Duration of AGSS Development 


136 






4$ 




— 
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Ave. 


Min. 


1st 
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2nd 

Mission 


FORTRAN/Ada Reuse Era 


GSS Era 


Note: GSS era estimates assume project completions by 1/30/97 

Figure 5: GSS Reduces Deployment Cycle Time 


* TP costs removed from application costs for first 2 eras; TPs unecessary in GSS era. 
b Library maintenance costs included in 2nd era; GSS mission costs include total of 10 Khr of 
GSS overhead (library maintenance, etc.) 

Figure 3: Reduced Deployment Costs 
Due to GSS Process 
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the fourth GSS supported mission. 

3.1.2 Application Deployment Cycle Time 

The GSS process has resulted not only in a great 
reduction in the cost of deploying an application, 
but also in a significant reduction in the cycle time 
required to deploy an application. Figure 5 
reveals that the time to field an AGS S during the 
FORTRAN reuse era ranged from 61 to 136 
weeks, with an average of 101 weeks. The time 
required to design, configure, and test the 
applications for the first GSS-supported mission 
was a little less than the average for the preceding 
era. The second project, however, was completed 
in less than half of the average cycle time for the 
FORTRAN/Ada era. In fact, it took less time than 
any project in the previous era. It seems likely 
that project duration can be further reduced with 
this reuse process. 

3.2 Process Diagrams 

After gaining an initial understanding of the GSS 
environment and how it is used, the team 
developed a detailed interview guide and 
conducted structured interviews with most of the 
designers, developers, configurers, and testers 
involved in the GSS processes. Once a sufficient 
body of information had been collected, we began 
to organize it by modeling the relevant processes, 
in particular the GSS configuration process. 

We chose to use Yu's Actor-Dependency (AD) 
model to portray the interactions, roles, and 
dependencies between the actors in the GSS 
processes. Figure 6 is an AD model reflecting the 
same level of detail as depicted in Figure 2. The 
AD diagram reflects how each team depends on 
other teams. The types of dependencies are 

• resource dependencies (depicted by a 

rectangle), which indicate that the depender 
relies on some artifact, document, or 
information from the dependee; 


• task dependencies (depicted by a hexagon), 
which indicate that the depender relies on the 
dependee to complete some defined set of 
steps. The dependee may or may not be 
aware of the goals of this task; 

• goal dependencies (depicted by an oval), 
which indicate that the depender relies on the 
dependee to achieve some well-defined goal. 
The depender has a great deal of freedom to 
determine how to reach that goal; and 

• soft goal dependencies (depicted by a 
distorted oval, i.e., a "peanut" shape), which 
indicate that the depender relies on the 
dependee to achieve some goal which is not 
well-defined, i.e. the depender and dependee 
may not agree on, and must negotiate, exactly 
how the goal is to be satisfied. 

The following AD diagrams focus more on the 
GSS application configuration process and show 
the relevant roles and dependencies at a lower 
level of detail. 

Figure 7 expands the complex social actors of 
Figure 6 into their substructure of agents, roles, 
and positions. Agents are actual, physical people 
and groups of people that actors represent. Roles 
indicate what parts of the process an actor is 
involved in. Positions are the organizational titles 
and jobs that an actor holds. Positions generally 
“cover” one or more roles, while roles are 
“played” by an agent, who also “fills” one or more 
positions. In Figure 7, only some of the relevant 
dependencies are shown and (for the most part) 
are not identified by type in order to simplify the 
diagram. 

Figure 8 shows, at a high level, the sequences of 
tasks that must be completed in order to configure 
a GSS application, and the inputs and outputs of 
those tasks. Tasks are represented as ovals and 
artifacts (inputs and outputs) as rectangles. Many 
of the tasks refer to task dependencies in Figure 6. 
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Figure 6. Actor-Dependency (AD) Model of GSS Application Deployment Process 
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4. Recommendations for 
Improvements to the GSS 
Configuration Process 

As is often the case, organizational and technical 
details which were overlooked at the project’s 
inception have come back in various forms to 
threaten the full success of GSS. Despite dramatic 
reductions in application deployment cost and 
cycle time, the GSS process has not won the full 
support of all groups within the FDD. Although 
FDD management mandated that software 
developers and analysts would jointly design the 
GSS process, the resulting process is today viewed 
by many as the child of the software developers, 
with less than full partnership from the analysts. 

But this is more than merely a perception. The 
current GSS process provides a good tool that 
allows traditional software developers to quickly 
configure flight dynamics software applications. 

At the same time, however, the current GSS 
process contains hurdles for mission analysts, 
whom FDD management would like to see making 
more direct use of the GSS, This is because the 
GSS process and the GSS documentation are 
inherently more understandable to the GSS 
developers and configurers than to the majority of 
FDD mission analysts. As discussed later, the 
writing of the initial mission specification in 
particular is a task logically performed by mission 
analysts, but at this time it requires a very 
technical level of understanding of GSS. This 
level of understanding is very difficult, and not 
necessarily appropriate, for analysts to achieve. As 
a result of this, relatively few FDD analysts are 
currently involved in the GSS process. 

As a result of our in-depth characterization of the 
GSS configuration process, we discovered several 
opportunities for improvement. Some of these 
were synthesized from the comments of several 
interviewees, while others came directly from GSS 
developers, configurers, and testers. Most relate 
to the problem described above (of the barriers to 
use by analysts), but also would improve the GSS 
process in other ways as well. 

4 . 1 Storing application requirements 

Several problems were cited that might be 
ameliorated by storing the information contained 
in the mission specification in database form. 

First of all, it would facilitate the reuse of 
requirements, which is common from one 


application to another. Instead of manually 
editing reused parts lists, display files, parameter 
files, etc., database operations could be used to 
modify these elements in the database to help 
ensure consistency and avoid errors. 

Secondly, it has been stated as a goal of GSS that 
eventually mission analysts should be able to 
configure attitude software with little or no 
intervention from GSS developers. There are 
several barriers to achieving this goal, one of 
which is that the writing of the mission 
specification seems to require very specialized 
skills. This is more than a user interface problem, 
but using a database format rather than a textual 
one may help. 

Designing and maintaining a database for mission 
and application requirements would not be a 
simple task. It would require the borrowing or 
hiring of a specialist in database design, and a 
careful analysis of the needs that the database is 
meant to satisfy. Because of some of the points 
discussed above, a database system with an 
adequate user interface is especially important. 
Also, it would be helpful to be able to integrate 
this database with other databases used in the 
environment, e.g. databases used to store new 
component information. 

4-2 Automatic generation of configuration 
inputs 

Another advantage of storing mission-specific 
information in a database is that it would facilitate 
the automatic generation of some of the inputs to 
the GSS configuration. Generating these files at 
present is tedious and time-consuming. Writing 
the parts list in particular has been described as a 
translation of the mission specification from one 
notation to another. Such a translation could be 
automated if the mission specification were stored 
electronically. Even better, the tools which 
process the parts list could be rewritten so that 
they access the database directly. As mentioned 
later, such a database could also facilitate the 
automatic generation of some parts of the user’s 
guide. Also, it is conceivable that a database of 
application requirements could also be used to 
automatically generate the artifacts needed as 
input to UIX (the user interface facility), including 
the display files, the parameter files, and the 
message files. 
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4.3 Support for learning GSS 

As mentioned earlier, the specialized skills 
required for writing mission specifications seem to 
be a barrier to making GSS usable by mission 
analysts. Making the mission spec database-based 
rather than a textual document may help 
somewhat. However, it does not solve the root 
problem, which is that writing the mission 
specification involves choosing the proper 
configuration of GSS components for a particular 
mission. This requires a level of understanding of 
the GSS architecture that, up until now, mission 
analysts have been unable or unwilling to attain. 
This problem has both organizational and 
technical aspects. Analysts were not involved 
enough in the development of GSS to give them 
any sense of ownership. Thus, they are not highly 
motivated to take the time necessary to learn to 
use GSS. Motivation is further inhibited because, 
up until now, one particular analyst has been 
willing to take on the task of writing mission 
specifications for all missions using GSS-based 
software. From a technical point of view, the 
current documentation on GSS (the GSS 
functional specifications) are written by and for 
software developers, not mission analysts. Their 
size and technicality are daunting, to say the least, 
and their organization is closely tied to the 
organization of the software, which is not 
necessarily the most logical from a user’s point of 
view. 

Thus, if GSS is to achieve the goal of being fully 
usable by mission analysts, a serious effort must 
be made to support learning. There is a growing 
area of research and development in software 
engineering in object-oriented frameworks; for 
example, the SEL is studying learning and reading 
techniques for frameworks (Reference 4). GSS fits 
the definition of an 0-0 framework, which is a 
domain-specific repository of software classes 
which fit into a cohesive architecture designed 
specifically for the domain. To the best of our 
knowledge, GSS is the only 0-0 framework 
specific to the flight dynamics domain. However, 
much of what has been learned about how to 
support the learning of frameworks in other 
domains could be applicable here. A number of 
strategies have been used: cookbooks of 
application templates and variations, example 
applications, documented class hierarchies, etc. 
One approach may be to develop a scenario-driven 
overlay for the GSS functional specifications 
which helps organize the specifications according 
to user scenarios. Many of these techniques could 


be useful in helping mission analysts understand 
GSS sufficiently to begin producing their own 
applications. 

Designing learning support materials for GSS 
would involve some experimentation to determine 
which strategies are most helpful for mission 
analysts. This would require some investment of 
time and resources, and a serious commitment to 
finding an appropriate solution for the FDD 
domain and organization. It is also crucial that the 
support materials are designed for the most part by 
mission analysts, not software developers. The 
involvement of members of the analyst branch of 
FDD is necessary to ensure that the materials, and 
GSS, will be used in the future. 

4.4 User's Guide 

User’s guides are required to be delivered to the 
acceptance testers with the application, but they 
are usually not completed until well after that 
point. Testers usually do not have them available 
in time to help with testing at all. Instead, they 
rely for the most part on the mission specification. 
However, the testers did not seem to see this as a 
big problem. The configurers, on the other hand, 
were not highly motivated to write user’s guides 
and it was treated as a necessary but low-priority 
chore. A suggested improvement, then, is first to 
determine what information is really useful in the 
user’s guide (for both testers and eventual users), 
then to investigate the possibility of automatically 
generating parts of the user’s guide from the 
mission specification (this might be facilitated by 
the database suggested earlier), and finally, if 
necessary, assign a qualified technical writer to 
take on the writing of user’s guides, as a task apart 
from configuration of the application. 

5. New Directions for Reuse 
Study 

Having characterized the GSS process, the Reuse 
Study Team will concentrate in the coming months 
on putting this process into perspective, 
particularly with respect to its changing technical 
and organizational context. First of all, a number 
of technological advances have taken place in 
software engineering since the inception of GSS. 
These advances may be relevant to how GSS is 
used in the future. Furthermore, some 
developments in the marketplace have produced 
alternative approaches to reuse. Some of these 
may be appropriately used instead of GSS in some 
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cases. The focus of the Reuse Study Team in the 
near term will be to study which of these emerging 
technologies could best be incorporated into GSS 
and how, and under what conditions GSS could be 
supplanted with technology that is now available 
elsewhere. We hope to evolve guidelines to be 
used by FDD mission teams in choosing how best 
to produce their software applications. In the 
sections below we outline some of the issues on 
which we will concentrate. 

5.1 Evolving Technologies 

Over the years that the GSS has been evolving, 
many technologies have been evolving in the 
marketplace. Some of these technologies require a 
second look to see how they compare to the GSS 
process today. It may be that the GSS process 
could benefit from incorporating some of these 
technologies. 

5*1.1 Object Orientation 

The GSS assets have been built from an object- 
oriented perspective since its inception. In many 
ways, the development of GSS was ahead of its 
time, in that tools and techniques for developing 
object-oriented systems were not available when 
the GSS team needed them. For example, the only 
object-oriented programming languages that were 
available at the inception of GSS were Ada83 and 
Smalltalk. Now, other languages are available, 
such as C++ and Ada95, along with supporting 
tools. We will consider whether or not GSS 
suffered from not having these languages and tools 
available, and if any of the currently available 
languages and tools might be useful in the future 
maintenance of GSS. The software engineering 
field also knows more now about such topics as 
object-oriented design, testing, and maintenance. 
New advances need to be examined to determine 
their applicability to GSS. 

5.1.2 Graphical User Interfaces 

A User Interface and Executive (UIX) was 
developed by a separate group of FDD 
developers, in parallel with GSS, to provide GUI 
capability for GSS-based applications. It was 
decided to develop the GUI capability in-house 
because, at that time, no appropriate GUI 
packages were available in the commercial 
market. That is no longer the case, so it is 
appropriate to compare UIX to what is currently 
available commercially, off-the-shelf (COTS). It 


may be cost-effective to replace UIX with a more 
user-friendly and robust GUI capability developed 
elsewhere. 

5.1.3 Other COTS Products 

To support the GSS process, a number of tools 
have been developed in-house, such as code 
generators and editors. Most of these were 
developed in an ad hoc (as needed, as time 
permitted) manner. As the sophistication and 
quality of currently available COTS products has 
risen, we will investigate whether some could be 
used to support the GSS process. Some COTS 
products may even be appropriate to replace the 
GSS process in some cases, as discussed below. 

5.2 Alternative Reuse Processes 

For several years, the FDD has been slowly 
developing more and more software on UNIX 
workstations and weaning itself from its traditional 
reliance on the IBM mainframe. In the 1990s the 
FDD began to develop some of its attitude support 
software for execution on UNIX workstations 
rather than on the IBM mainframe computer. For 
example, the AGSSs supporting the three most 
recent operational satellites (SOHO, SWAS, and 
XTE) ran partly on the IBM mainframe and partly 
on the UNIX workstations. Since the FORTRAN 
reuse libraries resided only on the mainframe, the 
subsystems based on the workstations had to be 
written essentially from scratch. The GSS 
strategic reuse library was designed entirely for 
UNIX workstations, and would have been useful 
for these subsystems, but it was not yet available. 

The movement from the mainframe to 
workstations received a big impetus near the end 
of fiscal year 1995, when FDD management 
mandated that all software would be removed 
from the IBM mainframe computers by the end of 
fiscal year 1996. Consequently, much of the 
institutional and mission-specific FORTRAN code 
on the IBM mainframes needed to be ported to 
workstations in a hurry. 

It was initially decided that the mainframe 
portions of the three most recent operational 
AGSSs would be re-implemented on the 
workstations by configuring them from the GSS 
library. In order to continue supporting the older 
legacy missions, however, an alternative method 
was sought. Since these AGSSs were built 
primarily from the FORTRAN reuse libraries and 
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ran entirely on the mainframe, it was decided to 
port these libraries to the workstations. 

The FORTRAN reuse library used for supporting 
non-spinning satellites was rehosted by two 
mission analysts with considerable support from 
some COTS products. FORTRAN subroutines 
were edited using word processors in order to 
conform to language restrictions of the COTS 
products. The analysts followed some process 
shortcuts and made liberal use of certain language 
features provided by the COTS products. During 
this rehost, the library specifications were not 
rigorously followed and were not updated to 
reflect the rehosted version of the library. Another 
FORTRAN reuse library, used to support spinning 
satellites, was rehosted by software developers, 
using the same COTS products. However, they 
closely followed the library specifications and 
made little attempt to take advantage of language 
features unique to the COTS products. 

The analysts who rehosted the first library enjoyed 
using the COTS product and demonstrated that the 
rehost could be done cheaply and quickly. They 
found that they had a lot of control over the 
process and were able, because of their position, 
and/or the features of the COTS products, to 
rapidly make changes to the library during the 
rehost. As a result of their favorable experience, 
the rehosted libraries, together with their COTS 
umbrella, are now viewed as an alternative process 
for supporting new FDD missions as well as 
legacy missions. 

In addition to these COTS products used for 
rehosting attitude determination systems, there are 
additional COTS products that can meet various 
other parts of typical FDD mission requirements. 
Some of these products are already being 
reviewed and adopted to support 
mission/maneuver planning and orbit/navigation 
requirements for upcoming FDD missions. 

The Reuse Study Team has been charged with 
studying the processes associated with the 
maintenance and reuse of GSS, as well as those 
that utilize the rehosted FORTRAN reuse libraries 
in the development of mission support software. 
Our work thus far has resulted in a detailed 
understanding of the GSS configuration process, 
described in the previous sections. As well, we 
have come to some understanding of the questions 
around which to focus this comparison. These 
questions represent some points of disagreement 
between COTS and GSS proponents, some 
concerns raised by developers and users of both 


approaches, and our own analysis of interview 
data. These questions are presented in the 
sections below. 

5.2.1 User Interface 

GSS uses a unified user interface called UIX for 
all applications. UIX was developed in-house, in 
parallel with GSS. This has caused some 
problems in the testing of GSS, when errors turn 
out to be UIX errors, not errors in the GSS code. 
The use of UIX also requires the handling and 
formatting of a number of large files (parameters, 
displays, messages) in configuring an application, 
which can be tedious and error-prone. 

Many COTS products provide their own GUI 
capability, which is used to create a user interface 
for each application. This interface is not 
necessarily consistent. 

How important is a unified user interface? How 
difficult would it be to unify all the COTS-based 
user interfaces? 

5.2.2 Is Object-Oriented Technology 
Superior? 

The rehosted libraries are written in a procedural 
language associated with the COTS products used 
to support the rehost, in some cases from scratch 
and in others converted from FORTRAN code 
using a text editor. GSS applications are mostly 
Ada83 with a small amount of C code in some 
cases. Thus, the GSS library is based on 0-0 
concepts, whereas the rehosted libraries, and their 
related applications, are not. Prior to GSS, the 
SEL determined that the use of Ada and 0-0 
concepts in the FDD resulted in smaller systems to 
perform more functionality, while the FORTRAN 
reuse libraries continued to grow in size. 

Since they are based on FORTRAN, will the 
rehosted reuse libraries continue to have the same 
disadvantages (in particular, code growth) as did 
the original FORTRAN libraries? If so, this 
makes the FORTRAN libraries a less attractive 
choice compared to 0-0 Ada reuse libraries. Or is 
there some attribute of the COTS products or the 
rehosting process which mitigates these 
disadvantages? 

5.2.3 Software Engineering Practices 

The design of the rehosted libraries relies heavily 
on the use of Global COMMON data. The 
software elements of the resulting applications are 
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very tightly coupled to these data structures. Also, 
as mentioned earlier, one of the rehosted libraries 
has a code structure which mirrors the original 
FORTRAN structure very closely. Some 
developers also expressed concern that the 
rehosting efforts did not follow standard software 
engineering practices, such as inspections. On the 
other hand, it could be argued that rehosting does 
not warrant such a high process overhead because 
it is based on software that has been in operation 
for a long time. 

GSS, on the other hand, was developed in 
accordance with more modern 0-0 concepts and 
practices. A rigorous software engineering 
process was followed, including design and code 
inspections and rigorous testing. 

Does the use of 0-0 concepts and software 
engineering practices really make a difference in 
this case? Or does the fact that the rehosted 
software is based on such a time-tested library 
make up for its deficiencies in this area? 

5.2.4 Maintenance 

Both FDD COTS users and GSS proponents stress 
the advantages of their respective approaches for 
maintenance. The systems based on the rehosted 
libraries are argued to be easily and quickly 
modified by someone who is familiar with the 
domain, but not necessarily with software 
development. That is, an analyst does not have to 
rely on a software developer to make every change 
required. Using a GSS-based application, on the 
other hand, requires a delay whenever a change is 
requested, often until the next release of the GSS 
library. Thus using the MATLAB-based rehosted 
libraries provides users much quicker turnaround 
time on modifications of the application than does 
using GSS. 

GSS proponents argue, on the other hand, that any 
system will degrade over time if it is allowed to be 
changed unsystematically by users. Also, the 
structure of GSS was designed to facilitate change 
without adding complexity or large amounts of 
new code. 

Is it more important for the user to have quick 
turnaround on requested changes, or to manage the 
evolving structure of the software? Is there a 
reasonable compromise between the two? Do the 
COTS-based applications become more difficult 
to maintain the larger the application is? Does the 
design of GSS really ensure that it will not 
degrade over time? 


Are developers and analysts using different time 
scales (i.e., "quick" is 1 hr. for an analyst, but 1 
day for a developer?)? Are developers and 
analysts looking at different scopes of the 
modification process (i.e., a developer looks at 
how quick it is to change the code, whereas an 
analyst looks at how long he has to wait to get the 
revised)? 

5.2.5 Performance 

The applications based on the rehosted libraries 
are interpreted, not compiled. In some cases the 
source code was automatically converted to C, 
then compiled. This compilation step improves 
processing speed by a factor of two, but still 
remains slower than traditional FDD applications. 
How much slower are the COTS-based 
applications than GSS-based applications, and is 
this difference noticeable or important to users? 

5.2.6 Reliability 

The AGSSs based on the rehosted libraries rely 
heavily on the intrinsic capabilities of the 
underlying COTS software for performing a 
number of mathematical manipulations. Care 
must be given to separate out errors in the COTS 
software from errors in the custom developed 
portions of the code. GSS components, on the 
other hand, have exhibited very low defect levels 
in acceptance testing. No applications of either 
approach, however, have been operational for long 
enough to assess field reliability. 

What assurances do we have of the reliability of 
COTS products? How can it be assessed? 

5.2.7 PortabOity 

The applications based on the rehosted libraries 
are all designed to be part of a single system using 
the GUI provided by the COTS product used in 
the rehost. This makes porting the components 
relatively easy for any target platform which 
supports that product. On the other hand, there 
were some difficulties recently in porting one of 
the GSS-based AGSSs from the HP to the Sun 
workstations because UIX (the user interface 
which GSS uses) had not previously been ported 
to the Sun. 

How important a criteria is portability? Can UIX 
and GSS be made more portable in the future? 
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5*2.8 Documentation 

During the porting of one of the FORTRAN 
libraries, the original FORTRAN code structure 
was followed very closely. Thus, the original 
specifications for the FORTRAN software are still 
valid for the rehosted version. However, none of 
the advanced features of the COTS products were 
used which would have allowed a more efficient 
restructuring of the code. These features were 
used heavily in the porting of the other 
FORTRAN reuse library. As a consequence, the 
code is more compact than it was, but the original 
software specifications are no longer valid and no 
new specifications have been written. The analysts 
who were responsible for porting the libraries 
believe that, to a certain extent, a separate 
specifications document becomes less necessary 
because in the programming language used 
(associated with the underlying COTS products), 
the equations are written exactly as they would be 
written in the specification. 

The design of the GSS system is documented in 
the GSS functional specifications, but these are 
1600 pages long and, as mentioned earlier, are a 
real barrier to understanding the system for its 
eventual intended users, mission analysts. 
However, they seem to provide all relevant 
information necessary for maintaining the GSS 
components, and are written from a software 
developer’s point of view. 

Is either type of documentation sufficient for 
operation and maintenance purposes? Is the 
COTS-based code really self-documenting enough 
for maintainers to correctly make modifications? 
Can users of GSS components and applications be 
taught to use the GSS specifications effectively? 

6. Conclusions 

This paper presents the interim results from the 
SEL’s Reuse Study. The team conducting this 
study has, over the past few months, been studying 
the GSS domain asset library and architecture, and 
the various processes associated with it. In 
particular, we have characterized the process used 
to configure GSS-based attitude ground support 
systems to support FDD missions. To do this, we 
built detailed models of the tasks involved, the 
people who perform these tasks, and the 
interdependencies and information flows between 
these people. These models were based on 
information gleaned from numerous interviews 
with people involved in this process at various 


levels. We also analyzed effort data in order to 
determine the cost savings in moving from actual 
development of AGSSs to support each mission 
(which was necessary before GSS was available) 
to configuring AGSS software from the domain 
library. 

While characterizing the GSS process, we also 
became aware of several interesting factors which 
affect the successful continued use of GSS. Many 
of these issues fall under the subject of the 
evolving technologies, which were not available at 
the inception of GSS, but are now. Some of these 
technologies could be incorporated into the GSS 
process, thus making the whole asset library more 
usable. Other technologies are being considered 
as an alternative to the GSS process altogether. In 
this paper, we outline some of issues we will be 
considering in our continued study of GSS and the 
impact of evolving technologies. 
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FDD Environment 


Size: 100 civil servants, 300-400 contractors 

Mission: Deploy mission-critical applications 
for NASA space ground systems 


■ 3 Software Domains 

♦ Attitude Determination 

♦ 200-300 KSLOC attitude ground support systems (AGSS) 

♦ 40-70 KSLOC telemetry simulators 

♦ Mission/Maneuver Planning 

♦ Orbit and Navigation 
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SEL sponsored experimentation in 0-0/Ada83 

♦Telemetry simulators (40-70 KSLOC) on VAX 

♦Application-specific architectu res 

♦High reuse levels for telemetry simulators (>90 %) 


FORTRAN AGSSs (200-300 KSLOC) 

♦Unable to adopt Ada on mainframe -- lack of tools 
♦Some success with domain engineering (-70 % reuse) 
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Evolution of GSS 



‘Generalized Support Software (GSS): a library of generalized, 
configurable application components developed with an object- 
oriented domain engineering approach. 
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‘Generalized Support Software (GSS): a library of generalized, 
configurable application components developed with an object- 
oriented domain engineering approach. 
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*Generalized Support Software (GSS): a library of generalized, 
configurable application components developed with an object- 
oriented domain engineering approach. 
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Communications and Control Model 
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FDDS/GSS Architecture 


The GSS Architecture Hierarchy 



Object: a model of some individual 
item of interest in the problem domain. 

Applications 
Reuse Library 

Class: a generalized object 


Category: a set of similar classes grouped 
together along with rules for using these member 
classes for mission support. 



Subdomain: a group that contains all 
categories necessary to specify the functionality 
in a specific high-level area of the overall 
problem domain. 


OOPSLA *96 - October 10, 1996 
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Project 

Organization 


Experience 

Factory 


The GSS as an 
Experience Factory 
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Deployment savings likely to recoup GSS investment by 4th mission. 
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Issues with the GSS Process 


■ GSS viewed as a “child” of the S/W developers. 

■ Can’t write the (GSS) mission spec without 
understanding the GSS functional specs. 

■ The GSS functional specs (1600 pages) are written 
by and for developers -- not for analysts. 

■ Very few analysts involved in GSS process. 

■ Many analysts cool towards GSS. 


Potential Improvements 
for the GSS Process 


■ Create a database for mission requirements 
(text-based now) in order to reduce 
mission spec effort. 

■ Automate the generation of mission 
specifications and configuration inputs. 

■ Create a scenario-driven overlay -- designed 
by analysts — for the functional specs. 
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Evolving Techologies 


■ 0-0 languages evolving: Ada83 -> C++ and 


Ada95 


♦ GSS Attitude Subdomain in Ada83 

♦ GSS Mission Planning Subdomain in C++ 

■ 0-0 design techniques evolving 

♦ use cases (scenario-driven) 

■ Marketplace GUI’s more advanced now 

■ COTS products more powerful, more varied 



Alternative Reuse Processes 
Now Available 


■ FORTRAN reuse libraries were rehosted to 
workstations using COTS products; 

can support future missions as well. 

■ Other COTS products being used for mission 
support. 

■ New missions can choose GSS and/or COTS. 
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Understanding Alternative 
Reuse Processes 


■ Would GSS benefit from a different GUI? 

■ Does 0-0 Tech, in GSS make it more robust or 
maintainable than non-0-0 COTS products? 

■ Other maintenance issues 

■ Performance 

■ Reliability 

■ Portability 

■ Documentation 



♦ Deployment time. 

♦ Application deployment costs — > 10% of pre-GSS costs. 

♦ Recoup library investment in 4 missions? 

■ GSS not designed for FDD analysts 

♦ Functional specs, mission specs, configuration process 

♦ Mods needed to make GSS process more useful to analysts. 

■ Alternative reuse processes now available. 

■ More work needed to compare and assess GSS and 
COTS. 
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1. Introduction 

Software reading is a key technical activity that aims at achieving whatever degree of understanding is 
needed to accomplish a particular objective. The various work documents associated with software 
development (e.g., requirements, design, code, and test plans) often require continual understanding, review 
and modification throughout the development life cycle. Thus software reading, i.e., the individual analysis 
of textual software work products, is the core activity in many software engineering tasks: verification and 
validation, maintenance, evolution, and reuse. 

Through our work in the SEL, we have evolved our understanding of reading technologies via 
experimentation. We have run empirical studies ranging from blocked subject-project experiments (reading 
by step-wise abstraction vs. functional and structural testing [Basili, Selby87]) to replicated projects 
(University of Maryland Cleanroom study [Selby, Basili, Baker87]) to a case study (the first SEL Cleanroom 
study) to multi-project variation (the set of SEL Cleanroom projects [Basili, Green94]) and most recently, 
back to blocked subject-project experiments (scenario-based reading vs. current reading [Basili, Green, 
Laitenberger,Lanubile, Shull, Soemmgaard,Zelkowitz96], [Porter, Votta,Basili95]). 

We have used a variety of experimental designs to provide insight into the effects of different variables 
on reading. The experiments are based upon the ideas that reading is a key technical activity for improving 
the analysis of all kinds of software documents and that we need to better understand its effect. We believe 
these studies demonstrate the evolution of knowledge about reading, experimentation, and the packaging of 
experimental results over time. Several of these experiments have been replicated by other researchers. 

To provide a technological base to software reading, we attempt to develop specific reading techniques, 
made up of a concrete set of instructions which are given to the reader on how to read or what to look for in 
a software work product. Our current research efforts focus on the development of families of reading 
techniques, based on empirical evaluation. Each family of reading techniques can be parameterized for use 
in different contexts and must be evaluated for those contexts. 

The taxonomy of reading families is shown in Figure 1. The upper part of the tree (over the dashed 
horizontal line) models the problems that can be addressed by reading. Each level represents a further 
specialization of the problem according to classification attributes which are shown in the rightmost column 
of the figure. For example, reading (technology) can be applied for analysis (high level goal), more 
specifically to detect faults (specific goal) in a requirements specification (document) which are written in 
English ( notation/form ). 

The lower part of the tree, (below the dashed horizontal line) models the specific solutions we have 
provided to date for the particular problems, represented by each path down the tree. The solution space 
consists of reading families and reading techniques. Each family is associated with a particular goal, 
document or software artifact, and notation in which the document is written. Each technique within the 
family is: (1) tailorable, based upon the project and environment characteristics; (2) detailed, in that it 
provides the reader with a well-defined set of steps to follow; (3) specific, in that the reader has a particular 
purpose or goal for reading the document and the procedures that support the goal; (4) focused, in that it 
provides a particular coverage of the document, and a combination of techniques in the family provides 
coverage of the entire document. Finally each technique is being studied empirically to determine if and 
when it is most effective. 
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Figure 1. Families of reading techniques 


Each software life cycle phase contains both construction and analysis activities. The design phase, for 
example, is responsible for creating design documents, as well as for analyzing their quality. Since 
construction and analysis are two parts of the same phase, you can learn from analysis technologies about 
construction technologies. At a high level, we divide reading activities into Reading for Analysis and 
Reading for Construction, to parallel this distinction between analysis and construction processes and to 
show that the usefulness of good reading techniques is not limited to any narrow portion of the software 
life-cycle. The next two sections describe our work in these areas. 

2. Reading for Analysis 

Reading for analysis is aimed at answering the following question: Given a document, how do I assess 
various qualities and characteristics? Reading for analysis is important for product quality; it can help us 
understand the types of defects we make, and the nature and structure of the product. It can be used for 
various documents throughout the life cycle. Besides helping us assess quality, it can provide insights into 
better development techniques. 

Our research into reading for analysis has so far emphasized defect detection ; we have focused on the 
requirements phase for this purpose. We have generated two families of reading techniques (collectively 
known as scenario-based reading ), by creating operational scenarios which require the reader to first create 
an abstraction of the product, and then answer questions based on analyzing the abstraction with a particular 
emphasis or role that the reader assumes. Each reading technique in a family can be based upon a different 
abstraction and question set. The choice of abstraction and the types of questions may depend on the 
document being read, the problem history of the organization, or the goals of the organization. 

The first family of scenario-based reading techniques is known as defect-based reading , and focuses on 
a model of the requirements using a state machine notation. The different model views are based upon 
focusing on specific defect classes: data type inconsistency, incorrect functions, and ambiguity or missing 
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information. The analysis questions are generated by combining and abstracting a set of questions that are 
used in checklists for evaluating the correctness and reliability of requirements documents. 

The second family of techniques, perspective-based reading , focuses on different product perspectives, 
e.g., reading from the perspective of the software designer, the tester, the end-user, the maintainer, the 
hardware engineer. The analysis questions are generated by focusing predominantly on various types of 
requirements errors (e.g., incorrect fact, omission, ambiguity, and inconsistency) by developing questions 
that can be used to discover those errors from the one perspective assumed by the reader of the document 
(e.g., the questions for the tester perspective lead the reader to discover those requirement errors that could 
be found by testing the final product). 

In order to understand the effectiveness of scenario-based reading techniques in particular, we have 
experimentally studied techniques from both families. The first series of experiments 
[Porter,Votta,Basili95], [Basili, Green, Laitenberger, Lanubile, Shull, Soerumgaard, Zelkowitz96] was 
aimed at discovering if scenario-based reading is more effective than current practices. Based upon these 
experiments, we had empirical evidence that scenario-based reading techniques can improve the 
effectiveness of reading methods. At the same time, we noted that some scenarios were less effective than 
others. We give some details of these experiments here in order to illustrate our own experiences with 
experimentation in software engineering. 

2.1 Defect-Based Reading Experiment 

In the defect-based reading study [Porter, Votta,Basili95], we evaluated and compared defect-based 
reading, ad hoc reading and checklist-based reading, with respect to their effect on fault detection 
effectiveness in the context of an inspection team. The study, a blocked subject-project, was replicated 
twice in the spring and fall of '93 using 48 graduate students at the University of Maryland. The 
experimental design was a partial fractional factorial design. The design was less elegant than the 
[Basili, Selby87] design because the comparison here is with the status quo approach (ad hoc) or with a less 
procedurally organized approach (checklists) so it is impossible to teach the subject a defect-based reading 
approach and then return to ad hoc or check list. In this case, a sort of ordering was assumed. On the first 
pass there were more ad hoc and check list readers. Several, but not all, were moved to defect-based reading 
on the second pass. 

Major results were that: 

• the defect-based readers performed significantly better than ad hoc and checklist readers; 

• the defect-based reading procedures helped reviewers focus on specific fault classes but were no 
less effective at detecting other faults; and 

• checklist reading was no more effective than ad hoc reading. 

2.2 Perspective-Based Reading Experiment 

In the perspective-based reading study [Basili, Green, Laitenberger, Lanubile, Shull, Soerumgaard, 
Zelkowitz96], we evaluated and compared perspective-based reading and NASA’s current reading 
technique with respect to their effect on fault detection effectiveness in the context of an inspection team. 
Three types of perspective-based reading techniques were defined and studied: tester-based, designer-based, 
and user-based. The study, again a blocked subject-project, was run twice in the SEL environment with 
NASA professionals. 

The design evaluated the effectiveness of perspective-based reading on both domain-specific and 
generic requirements documents, which had been constructed expressly so that the generic portion could be 
replicated in a number of different environments, while the domain-specific part could be replaced in each 
new environment. This would allow us to combine the generic parts from multiple studies but focus on 
improvement local to a particular environment. Based on feedback from the subjects and other difficulties 
encountered in the first run of the experiment, we were able to make changes to the experimental design that 
improved the second run. For example, we found it necessary to: 

• Include more training sessions, to make certain that subjects were familiar with both the documents 
and techniques involved in the experiment; 
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• Allow less time for each review of the document, since subjects tended to tire in longer sessions; 

• Shorten some of the documents, to reach a size that could realistically be expected to be checked in 
an experimental, time-constrained setting. 


Major results of this experiment were that: 

• both team and individual scores improved when perspective-based reading was applied to generic 
documents 

• team scores improved when perspective-based reading was applied to NASA documents 


Although the true benefit of PBR is expected to be seen at the level of teams which combine several 
different perspectives for improved coverage, the results for individuals showed that the use of PBR may 
lead to improvements at the individual level as well. Thus, we further analyzed the individual reviewers’ 
performance with the generic documents considering other attributes of effectiveness. Preliminary results of 
this second-round analysis were that: 

• PBR reviewers took more time than reviewers using their current reading technique but the average 
cost for finding a defect was the same for both the methods 

• The percentage of false positives for both methods is about the same. There were less false 
positives with PBR although the difference was not significant) 


If we consider that PBR reviewers found more defects than reviewers using their current reading 
technique and assume that the cost of finding a defect increases as more defects are found, we can conclude 
(for generic documents) that: 

• PBR is actually more productive than the local reading technique. 

• The relative effort spent fixing defects is better for PBR. 


By tailoring the perspectives also to the NASA application domain, we should be able to improve 
individual performance on these tasks. We need to improve the treatments used in the reading techniques. 
This can be done by developing questions for each scenario using the specific application domain (e.g., 
flight dynamics requirements documents), by focusing on the abstraction mechanism used (e.g., using a 
specific technique like equivalence partition testing for the testing perspective), or focusing the questions to 
cover certain classes of defects more effectively. 

We need to add a qualitative component to the controlled studies to gather more insights into what is 
needed to better set up the experiment, define the terminology, and interpret the results. For example, 
controlled experiments could be supplemented with various standard methods in qualitative analysis such as 
the use of pre-tests, post-tests, ethnographic studies, and interviews. 


3. Reading for Construction 

Reading for construction is aimed at answering the question: Given an existing system, how do I 
understand how to use it as part of my new system? Reading for construction is important for 
comprehending what a system does, what capabilities exist and do not exist; it helps us abstract the 
important information in the system. It is useful for maintenance as well as for building new systems from 
reusable components and architectures. 

Our emphasis here has so far focused on the reuse of an existing system or library. Reusing class 
libraries does increase quality and productivity, but class libraries do not provide default system behavior 
but only functionality at a low level, and force the developer to provide the interconnections between the 
libraries. Greater benefits can be expected from reusable, domain specific architectures and components 
that are of sufficient size to be worth reusing. Thus, we are currently focusing on the reuse allowed by 
object-oriented frameworks for this purpose [Lewis95]. 
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Since a framework provides a pre-defined class hierarchy, object interaction, and thread of control, 
developers must fit their applications into the framework. This means that in framework-based 
development, the static structure and dynamic behavior of the framework must first be understood and then 
adapted to the specific requirements of the application. It is assumed that the effort to learn the framework 
and develop code within the system is less than the effort required to develop a similar system from scratch. 
Although it is recognized that the effort required to learn enough about the framework to begin coding is 
high [Booch94], [Pree95], [Taligent95], little work has been done in the way of minimizing this learning 
curve. 

3.1 White-Box Frameworks 

We are studying the process of learning such a framework (or more generally, any unfamiliar system) 
and developing constructive reading techniques that may minimize the effort expended on program 
understanding in particular situations. This experiment involves the study of a white-box framework, which 
defines a set of interacting classes, usually abstract classes, that capture the invariants in the problem 
domain. Since the source code of the classes is accessible to the programmer, a white-box framework can be 
specialized by deriving application-specific classes from the base classes through inheritance and by 
completing or overriding their methods [Johnson, Foote88], [Schmid96]. Learning to use a white-box 
framework is the same as learning how it is constructed because the user must have detailed framework 
code knowledge. 

We have defined two reading techniques for using a white-box framework to build new applications: a 
system-wide reading technique and a task-oriented reading technique. Both techniques look at the static 
structure and the run-time behavior of the framework, and both have access to the same sources of 
information. The main difference is the focus of the learning process: the system-wide technique focuses 
more on the big picture than on the detailed task to be accomplished (which is the focus of the task-oriented 
technique). 

With the system-wide reading technique, programmers attempt to gain a broad knowledge of the 
framework design. As a consequence, they deliver the functionality required by the new application mainly 
by specializing the abstract classes of the framework. With the task-oriented reading technique, 
programmers use existing framework-based applications as examples and attempt to gain a specialized 
knowledge of the parts which are directly relevant for the required system. As a consequence, they deliver 
the functionality required by the new application mainly by changing the concrete classes of the examples. 

To compare these two techniques we have conducted a repeated-project experiment, in which we 
present graduate students and upper-level undergraduates with an application task to be developed using the 
white-box framework ET++ [Lewis95]. ET++ is a sophisticated framework that poses learning problems 
which can be major inhibitors against its use. The overall goal of the experiment is to compare the reading 
techniques for framework understanding (system- wide and task-oriented) with respect to their effect on ease 
of framework learning and usage, i.e., the ease with which the framework is understood and functionality is 
added. Students receive separate lectures on the reading techniques and work in teams of three people. One 
half of the class has been taught the system-wide reading technique and the other half the task-oriented 
reading technique. Preliminary results show that 

• Even a relatively well-designed although poorly documented framework presents many difficulties 
in learning how to derive framework-based applications 

• Students demonstrated an overhead in learning the framework with high levels of frustration in the 
early weeks because of the investment in time without an immediate payoff in programming 

• Students found it easier to learn in the beginning by reading and reusing example applications than 
by trying to first gain a comprehensive knowledge of the framework 

• Difficulties were encountered with the system-wide technique because the documentation provided 
was at an insufficient level of detail to be useful, and because the technique gave little guidance as 
to which area of the framework to concentrate on first. 

• Difficulties were encountered with the task-oriented technique because it was hard to find suitable 
examples for all required functionality and because example applications were sometime 
inconsistent in terms of structure and organization. 
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3*2 Black-Box Frameworks 

Black-box frameworks allow an application to be created by composing objects rather than by 
programming [Johnson, Foote88], [Schmid96]. They provide alternative concrete classes which have to be 
selected when creating an application, allowing some variability in the applications created. Thus, a black- 
box framework is customized by selecting, parameterizing, and configuring a set of components that 
provide the application specific behavior. 

The interface between components can be defined by protocol, so the user needs to understand only the 
external interface of the components. Since this does not require knowledge of the framework code, black- 
box frameworks could be considered easier to use than white-box frameworks. However, better 
documentation and training are required because developers cannot look at the source code to determine 
what is going on. 

We intend to investigate reading techniques for black-box frameworks in a real development context, 
focusing on the Generalized Support Software (GSS), a black-box framework developed and used to enable 
much more rapid deployment of flight dynamics applications at NASA/GSFC. The process for configuring 
a new mission-support application with GSS consists of selecting GSS classes to compile and link together, 
and setting values for a large number of control and operational parameters. The size and sophistication of 
the reuse asset library poses learning problems which can be major inhibitors against its use. Here, the goal 
is to improve the existing reading techniques which are used to understand which generalized components 
must be configured in order to develop new applications. 

Variations of the two reading techniques that were compared in the ET++ experiment (system- wide and 
task-oriented) will have to be designed for use with a black box framework. The idea behind each reading 
technique will be the same, however. One will require the framework user to learn the overall structure of 
the framework, while the other will help the student learn with specific examples. 

4. Conclusions 

Much of our work in reading has so far focused on three families of reading techniques: 

1. the defect-based reading family for analyzing requirements specification written in SCR notation, 
with the purpose of defect detection; 

2. the perspective-based reading family for analyzing requirements specification written in English 
language, with the purpose of defect detection; 

3. the scope-based reading family for constructing applications through reuse of white-box 
frameworks. 


We will continue to conduct empirical studies which will allow us to closely monitor the use of different 
reading techniques in laboratory and real projects, both quantitatively and qualitatively. We believe it is 
necessary to integrate results from both types of studies in order to gain a deeper understanding of the 
research questions. 

As our ability to understand software reading as a technique evolves, we plan to develop other families 
of reading techniques parameterized for use in different contexts and empirically evaluated for those 
contexts.. 
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Reading Motivation 


Reading is a key technical activity 

for analyzing and constructing software documents 

We need to evolve reading technology 

by improving the analysis of all kinds of software documents 

What is software reading? 

the individual analysis of a textual software product 
e.g., requirements, design, code, test plans 
to achieve the understanding needed for a particular task 
e.g., defect detection, reuse, maintenance 

We have evolved our understanding of reading technology in the SEL 
via a series of experiments 
from the early reading vs. testing experiments 
to various Cleanroom experiments 

to the development of new reading techniques currently under study 
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Reading Research 


What is a reading technique? 

a concrete set of instructions given to the reader 

saying how to read and what to look for in a software product 

Our current research efforts are to 
develop families of reading techniques 
based on empirical evaluation 
parameterized for use in different contexts 
evaluated for those contexts 

In this talk we discuss 

a taxonomy of reading families 

specific techniques and experimental evaluations 

where we are going in our research program 
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Families of Reading Techniques 



Code 


High Level Reading Goals 


We differentiate two goals for reading techniques: 


Reading for analysis: 


Reading for construction: 

Given a document. 


Given a system, 

how do 1 assess 
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various qualities 


how to use it as part 

and characteristics? 


of my new system? 

Assess for 


Understand 

product quality 


what a system does 

defect detection 


what capabilities do and do not exist 

Useful for 


Useful for 

quality control, 


maintenance 

insights into development 


building systems from reuse 
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Reading for Analysis: Perspective-Based Reading Experiment 

Goal of Perspective-Based Reading (PBR): 

detect defects in a requirements document 
focus on product consumers 


Controlled experiment run twice with NASA professionals: 



Reading for Analysis: Defect-Based Reading Experiment 

Goal of Defect-Based Reading (DBR): 

detect defects in a requirements document 
focus on defect classes 


Controlled experiment run twice with UMD graduate students: 
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Experiments with Reading for Analysis 
More Results from the PBR Analysis 


Generic Domain at the Individual Level: 

PBR found more defects than the local Reading Technique 

PBR took more time than the local Reading Technique 

And the average cost for finding a defect is the same for both methods 

Assuming that cost of finding a defect increases as more defects are found 


detection^ 
effort 

i ►- 

# defects found 

Might imply: PBR is more productive than the local Reading Technique 

V.R. BASltl — SEL-21 




Experiments with Reading for Analysis 


More Results from the PBR Analysis 

Generic Domain at the Individual Level: 

PBR found more defects than the local Reading Technique 
The percentage of false positives for both methods is about the same 



Might imply: Relative effort spent fixing defects later is better for PBR 
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Reading for Construction 


Interested in reading techniques 

to minimize the effort to learn a new tool or existing system 
for a specific application development 

Framework 

A set of classes augmented with a built-in model for defining how 
classes interact 

to reuse domain concepts 
to encapsulate implementation details 


Framework 


(domain specific) 

Custom Software 
(application specific) 


Two approaches: 

White-box frameworks - extend and modify classes 
Black-box frameworks - select and configure ready-made classes 



Experiments with Reading for Construction 
White-Box Frameworks 

We proposed two reading techniques emphasizing different facets of the 
framework: 


System-wide technique: 
study classes 

gain a broad knowledge of the 
framework design 
build system by choosing 
appropriate classes 


Task-oriented technique: 
study examples 

gain a specialized knowledge of 
directly relevant system parts 
build system by modifying 
examples 


Experimental design: 

Repeated project - 45 subjects - 15 three person teams 
Environment: 

University of Maryland upper-level software engineering course 
Project: developing an OMT diagram editor - GUI framework ET++ 

SSS vjl basju SEL-21 SSSS5SSSSS^SSSSSSSSS£SSS^== 
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Experiments with Reading for Construction 


Preliminary Results: White-Box Framework Experiment 

Students demonstrated an overhead in learning the framework 

- High levels of frustration in the early weeks, 

investment in time doesn’t yield immediate payoff in programming 

- Even a relatively well-designed* framework presents many difficulties 

*(but poorly documented) 

Learning curve seems worse for system-wide technique 

- More difficult to know which areas of framework to concentrate on first 

- Learning appears more difficult without example-based learning 


Questions: 

How prescriptive should the technique be? 
How do we evolve these techniques? 



Experiments with Reading for Construction 
Experiment with Black-Box Frameworks (GSS) 

We need to support analysts ability to understand and use GSS 


We hope to learn more about 

understanding and using black box frameworks to configure new systems 
based upon our studies with white box frameworks 

For example: 

Do analysts learn differently from developers? 

Would analysts do better configuring systems based on: 


system-wide approach: 


task-oriented approach: 

learning specifications/categories 


taking examples 

to gain broad knowledge configuring 


(e.g., past similar systems) 

new systems based on the specifications 


modifying the specification of 

and categories 


the old system 
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Conclusion 


We have developed three families of reading techniques 
parameterized for use in different contexts and 
evaluated experimentally in those contexts 


Scope Based 


Defect Based 




Perspective Based 
or (Role Based) 


System Task Inconsistent Incorrect Omission Tester 

Wide Oriented Ambiguity 



‘ser Developer 



Long Range Research Plan 


We need to 

Develop better empirical evaluation methods to study these techniques 
in the laboratory and in industrial settings 

Provide an Experience Base of technology evaluations that can be added to 
by other researchers and practitioners based upon their experiences with 
the technologies 

Develop other families of reading techniques 

and then 

Develop families of other techniques 
based on empirical evaluation 
parameterized for use in different contexts 
evaluated for those contexts 
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Session 2: Process 


Software Development Technology Evaluation: Proving Fitness-for-Use with 

Architectural Styles 
J. Cusick and W. Tepfenhart, AT&T 


Systematic Process Improvement in a Multi-Site Software Development Project 
H. Hientz, G. Smith, A. Gustavsson, P. Isacsson and C. Mattsson, Q-Labs GmbH 


An Empirical Study of Process Conformance 
S. Sorumgard, Norwegian University of Science and Technology 
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Software Development Technology Evaluation: 
Proving Fitness-for-Use with Architectural Styles 


1. OVERVIEW 

A cursory glance at a few trade journals will indicate that hundreds if not thousands of 
development tools are available on the market. Today, with the boom in Internet technologies, 
dozens of new tools enter the market place each month. Faced with this situation we were asked 
to define how to choose the best tools for use in the development of hundreds of AT&T’s business 
applications. Starting in early 1995 we began a revitalization of the software tool assessment 
practices of AT&T and especially AT&T’s Network Services Division (NSD). These efforts are 
discussed in this paper. 

An evaluation methodology was developed based on the concept of fitness-for-use as measured 
by the construction of architecturally representative applications within a laboratory environment. 
This method was used to evaluate dozens of commercial software development tools in order to 
select specific tools as corporate-wide standards. 

This work presents the specifics of our software technology evaluation methodology, including our 
research efforts, tool taxonomy, and evaluation procedures (especially our use of software 
architecture-style-derived certifying test suites). This paper does not present the specific tools 
selected through the application of this methodology. - 


2. SOFTWARE TECHNOLOGY EVALUATION 

Many evaluation techniques are known and meet with varying levels of success. Weighted 
averaging, benchmarking, figures of merit, etc., each have certain advantages and disadvantages 
(Kontio, 1995). Our approach is instead centered on the concept of demonstrated fitness for use 
in the environment of choice as measured by the applicability of any given tool to the dominant 
software architectures found within the target business environment. This approach reflects the 
“habitat models” suggested by Brown (1996). 

This approach stems from viewing evaluation of software from the question: How well does the 
provided functionality of a product span the needs associated with tasks to be performed using it? 
Evaluation is highly dependent on the use for which the product is intended and the results are 
subject to greater ambiguity than evaluations of other classes of products. Many manufacturers of 
software products will be more than happy to provide metrics for common performance criteria. 
Other questions are more subtle - does the tool provide the right abstractions, is it easy to use, 
does it take one hour to do something or ten days. It is these subtle metrics that we intended our 
evaluation environment to measure and for this we turned to Architecture Styles. 


3. SOFTWARE ARCHITECTURE STYLES 

A year long study of our software systems identified (at least) four basic architectural styles 
present in our business applications (Belanger, et. al., 1996). These styles are: transaction, data 
streaming, real time, and decision support. These styles consistently appeared, in part and in full, 
in a wide variety of systems including those for Financial, Maintenance, Provisioning, and Asset 
Management domains. We say, in part, because a majority of our systems are actually hybrids of 
these different architectural styles. 
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We eventually derived several certification applications from these styles in order to drive our 
evaluation process. Our core reasoning being that the development of small scale applications 
modeled after our target development tasks would prove the suitability of the product under 
evaluation. This turned out to be true for virtually all the products we evaluated. The entire process 
of which the architecture styles play a key role is now presented in detail. 


4. THE EVALUATION PROCESS 

Our approach to evaluating software technology is to appraise technology as “fit-for-use” if we can 
succeed in developing a sample application which has a reasonable similarity to our production 
applications. In other words, we use the product under evaluation in an environment modeled 
after the target development environment. The process can be summarized in the following 
manner: 

1 . Survey the available products 

2. Classify according to a technical framework 

3. Filter the list using screening criteria 

4. Construct evaluation criteria templates 

5. Use the target tools to build an Architecturally Representative Application 

6. Record findings against the templates 

7. Judge the best scores and select the recommended product 

4. 1 Survey the available products 

The overall evaluation process begins with surveying the tool market for candidate products and 
classifying them according to a technical framework sometimes called a taxonomy. Consider the 
survey effort first. 

Initial research into software tool availability, capabilities, and trends, can be both rewarding and 
daunting. The goal of tool research is to identify all or most of the tools currently available for the 
support of a particular stage of the software development process. This research is technical in 
that one must understand the technological capabilities of each tool. At the same time, this 
research is market oriented in that one must also understand trends and supplier positioning. 
Some of the techniques used in this activity include: 

• Literature Reviews: Books, journals, trade press publications. Key information on 
technical capabilities, product announcements, corporate changes, tool assessments 
and recommendations are readily available. 

• Trade Shows and Technical Conferences: We have found trade shows to be 
decreasingly helpful in identifying technologies of interest. This is due to the generally 
poor level of technical information available at such venues. Technical conferences 
on the other hand remain helpful in putting the available products into a theoretical or 
practical context. 

• Direct Mail: Believe it or not this is an effective means for collecting information once 
you are on enough mailing lists. (This may not be ecological but it is economical in 
terms of time; it only takes a few seconds to sort incoming product information.) 

• Automated Topic Searches: We receive weekly or monthly summaries extracted 
from current publications on software technologies and trends via email. 

• Web Browsing: This has become a significant source of information and freeware 
tools. We maintain a list of vendor web sites and this has often provided up to the 
minute information on particular products. 

• Vendor Demonstrations: Slicing through the sales pitch to the technical meat is 
often difficult but this remains an effective means of collecting detailed product 
knowledge for selected tools. 
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• Evaluation Copies: A time or event determined interval of hands-on experience, 
execution, and utilization of the tools is invaluable in understanding actual tool 
capabilities (this is discussed in detail below). 

• Professional Information Services: Several organizations are under contract to us 
providing strategic information on the software industry. This information is often 
helpful but can also be factually incorrect or misleading. These sources are useful 
more as sounding boards than anything else. 

• Private Contact Network: Having a wide network of software professionals to draw 
upon for knowledge of the industry and technology cannot be overlooked in research 
efforts. For example, teaching a continuing education course at a local university has 
brought several new tools to our attention through conversations with students. 

• Experience: Having been around the development community for a number of years 
directly impacts your ability to scan and decipher information on tools. Oftentimes 
“new” tools end up being familiar tools refaced. 

• Project Reference: Having access to the real life trials of hundreds of development 
projects we know early on what is needed, what works, and what provides less than 
advertised. 

The output of this research includes summary information on current product availability, industry 
trends, software standards and standards activities, computing techniques and methods, and 
development resources both internal and external. The specific products or technologies identified 
during our research efforts are given an initial classification in the tool and technology taxonomy 
discussed next. 


4.2 Classify according to a technical framework 

A Software Development Environment (SDE) can be viewed as an integrated set of tools and 
processes enabling analysts, designers, programmers, and testers to collaborate on the 
production of high quality software solutions. Traditional Software Engineering Environment (SEE) 
frameworks support the concept of creating an SDE by creating a view of the computing 
infrastructure as a unified and sensible environment with specified functional interrelationships 
instead of just a random assortment of tools (Brown, 1992). 

Unfortunately, SEEs are not well suited to the task of tool classification since they are operational 
in nature. We required a classification scheme to build our SDE recommendations that could be 
used to organize toolsets of an eclectic nature resulting from our market research. Existing tool 
taxonomies (Kara, 1995; Fugetta, 1993; Sharon, 1993) typically focused on particular application 
domains, limited platforms, or were designed to cover only CASE tools. Since these taxonomies 
did not meet the needs of our scope (multi-platform, process driven tool standards), we derived 
our own classification for software tools. 

To begin with our classification scheme inherited some structure from our corporate context. 
Domains typical of most software engineering environments sometimes fall outside of our mission 
charter. For example, operating systems, databases, and communications protocols are defined 
by other AT&T teams. Our mission was limited to a constrained view of Application Development 
technologies. 

We decided to base our tool classification on an existing software engineering framework (Utz, 
1992) and then modify it as needed (see Figure 1). The major categories provided by Utz are re- 
defined by us below. Each of these major categories are further detailed into sub-categories. 
Representative sub-categories are shown in Table i. As our market research efforts turn up tools, 
we categorize them in the taxonomy. Currently we have approximately 1,000 tools in a database 
organized by these categories. This database allows us to perform ad hoc queries on tool use 
within AT&T and to quickly produce candidate lists when evaluation efforts are begun. 
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Figure 1: Software Engineering Environment Framework as Tool Taxonomy 


4.2.1 The framework categories defined 

• Process Management : Tools supporting the specification, implementation, and 
compliance management of development processes. 

• Management & Metrics : Tools supporting the planning, tracking, and measuring of 
software development projects. 

• Requirements Definition : Tools supporting the specification and enumeration of 
requirements. 

• Analysis & Design : Tools supporting high level design and modeling of software 
system solutions following specific formal methodologies and often including code 
generation and reverse engineering capabilities. 

• implementation fCode/Debual : These tools allow both low level code 
implementation to support the edit-compile-debug cycle of development in 3GLs and 
visual based programming targeted at rapid application development by use of screen 
painters/generators with graphical pallets of reusable GUI components with 4GLs. 

• V&V: Tools providing software verification and validation, quality assurance, and 
quantification of reliability. These include test case management, test selection, and 
automated test support. 

• Release & Support : Tools targeted at supporting enhancements and corrections to 
existing code as well as browsers, source code analyzers and software distribution. 

• Content Creation : Tools used for developing Internet materials such as electronically 
published documents, graphics and multimedia components of Internet sites. 

• Documentation : Tools supporting creation and distribution of system documentation, 
specifications, and user information. These tools include documentation storage, 
retrieval, and distribution. 

• Software Configuration & Manufacturing : A broad class of tools related to the 
control of software components and development artifacts including documentation 
for the purpose of team based programming, versioning, defect tracking, and 
software manufacturing and distribution. 
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Process 

Process Definition & Compliance 

Project Planning & Metrics 
Project Planning 
Function Points 
General Metrics 

Requirements & Definition 
Requirements Trace 

Analysis & Design 

Object Oriented Analysis & Design 
Structured or Other Design Methods 
RDBMS Modeling 

Implementation 

Languages 

Editors 

Compilers & Debuggers 
IDEs 

GUI/Visuai Development 
Cross Platform Development 
Database Development 
Components 


Verification & Validation 

Test Management & Design 
Record & Playback 
Stress, Load & Performance 
Coverage 

Release & Support 

Distribution 
Reverse Engineering 
Emulation 
Utilities 

Content Creation 

Web Document Authoring 
Graphics Authoring 
Multimedia Authoring 

Documentation & Workflow 

System Documentation 
Help Authoring 
Workflow 

Software Configuration Management 
Source Code Control 
Defect Tracking 
Configuration or Manufacturing 
Integrated SCM 


Table 1: Selected Tool Taxonomy Sub-Categories 


4.3 Filter the list using screening criteria 

With a thousand tools in the taxonomy we have to start trimming the list whenever a particular 
technology sub-category must be evaluated. Using basic technical requirements many candidate 
tools can be eliminated. Platform support, negative reviews in the trade press, vendor instability or 
financial losses by a vendor can all be used to quickly eliminate certain products from the 
evaluation list. If negative criteria do not work we use positive criteria: is the tool “Editor’s Choice” 
or does our development community already use it as a de facto standard? These types of tools 
need to be on the evaluation list while others should be dropped. 


4.4 Construct evaluation criteria templates 

Each of the tool categories in the taxonomy needs specific evaluation criteria to measure the 
relevant attributes of each tool type in our taxonomy. Towards that end a set of templates must be 
developed for each type of technology evaluated. These templates resemble the ones found in 
many trade journals and bench-marking reports. The following must be created or reused: 

1 . First, one overall template for generic tool and vendor measurement is provided. This 
generic template covers such items as documentation, support, pricing, and platform 
availability. A standard set of issues regarding tools such as iconic design, menu 
features, ergonomics, printing, and so on, is included. 

2. Each analyst must then define a specific template which covers the technical aspects 
of the particular class of tool under investigation, if it does not already exist in our 
repository of templates. This must be created for each category. 
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4.5 Use target tools to build Architecturally Representative Applications 

Recall that we are interested in demonstrating “fitness-for-use”. To do this we now build a 
representative application with the produces) selected for evaluated from the taxonomy. Before 
evaluating any software technology we must first consider what capabilities it has and how to 
construct a suitable test suite or if our current set of application specifications will need expansion. 


4.5.1 Technologies and Their Tasks 

Each type of software product dictates certain kinds of tasks that will be the subject of evaluation. 
For example, word processors might be evaluated in terms of developing on-line (in program) 
documentation, help files, man pages, hard copy user manuals, and HTML documents. On the 
other hand, one would not evaluate a compiler in terms of its support of those same tasks. In 
some cases, products span more than one functional category. For example a C++ IDE might 
provide a visual programming environment, a class system, and a general purpose compiler. 
Since each of these is a separate endeavor, an evaluation of a C++ IDE will concentrate, 
independently, on the visual programming environment, class system completeness, and compiler 
performance. These are individual and discrete evaluations. Each will need specific resources to 
carry out the evaluation. 


4.5.2 Software Resources for Evaluation 

The software resources required to complete the data collection demanded by the evaluation 
template fall into three categories: 1) the software under evaluation; 2) supporting software (i.e., 
the operating system); and 3) software in the form of test cases (e.g., a sample design to 
implement). As we have shown, common architectures run through most AT&T applications. Our 
concept was to derive the required test cases from these architecture types or patterns. 

Software patterns (Gamma, 1995; Coplien, 1995) formalize some of the concepts on recurring 
underlying software construction themes. We devised evaluation test cases to demonstrate that 
any tool recommended supported AT&T’s specific computing problem domains. Thus we 
developed and specified a set of representative applications modeled after architectural styles or 
patterns observed in the field, to serve as certifying test suites for any tool slated for review (see 
Table 2). 


Sample Apps 

lf§i 
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Date. 
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x 
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X 
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GEM 


X 



X 



■ 

NetAnalyst 



X 


X 


X 

: zzm 

ToolBase 

X 


X X 

X 




_x_J 


Table 2: Representative Applications and their Architecture Styles 

The representative applications and their relationship to the generic architecture styles of Table 2 
are briefly described below: 

• Contact Data Base : The Contact Data Base is a very simple system for managing contacts 
on a project-by-project basis. Contacts are managed at the level of tracking individuals 
associated with a project, individual meetings, and tracking tools employed on the project. 
This application demonstrates a forms based interface for data entry and reporting. 
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• Co-Operative Document System fCODSI : The Co-Operative Document System allows 
multiple people to work on the same document. The basic capability of checking a document 
into and out of a document control system is augmented with a message broadcasting feature 
alerting users of a subscribed document’s state. This represents a client server system with 
data streaming and on-line transaction architectural components. 

• Graphic Enterprise Modeler (GEM) : The graphic organization display provides the ability to 
model graphically the structure of a corporate organization. It visually illustrates relationships 
among people, projects, and teams. To find answers to specific questions regarding an 
organization, the user follows semantically meaningful links and uses active graphics controls. 
This application demonstrates the user interaction style of the active graphics variety. 

• NetAnalvst : This application is a map based data visualization tool. It takes a set of real 
telecommunications data (the 1994 L.A. earthquake phone traffic) and plots it geographically. 
This is a common type of application profiling decision support and mapping. 

• ToolBase : This is an Intranet based front end to a product tracking database. This application 
provides for the evaluation of many types of Internet technologies and the extent to which they 
can support the architectural styles of OLTP and decision support on the Intranet. 

Returning to our evaluation process, an appropriate application is selected to test the tool class 
and development against a set of specifications describing the sample application is begun. 
Often, the specifications need modification or additional software design efforts need to be 
conducted to fully stress the products under evaluation (e.g., our Internet application did not test 
multimedia features as initially designed). Inferior products fail during implementation of the 
specifications and quickly drop out. 


4.6 Record findings against the templates 

Throughout the work of building the sample application, feature performance data must be 
captured on the custom template constructed for this technical category. This includes objective 
and subjective measures. Subjective data includes how intuitive the product was or how friendly 
the help desk was when called. Objective data includes if the promised features worked and if you 
could accomplish the task of building the sample application. 

Weighted Scoring Method (WSM) is normally used to provide a simple rating mechanism for 
each product under evaluation. In this method each item in the criteria matrix is assigned a score 
or weight score. Usually a score of 1 to 5 is given to the product for each criterion. Then an overall 
score can be derived using the formula below (Konito, 1996): 


n 

Score a ( weightj * score a j ) 
j-i 


4.7 Judge the best scores and select the recommended product 

The final step is recommending a product. Out of the short list all products are evaluated. Using 
the sample application as a test suite the superior product normally emerges. With a WSM 
technique there is very small opportunity for any ties. The analyst must, however, still exercise 
their best judgment in selecting a product for recommendation. 


5. EVALUATION PROCESS RESULTS 

Within a laboratory environment we developed these representative applications repeatedly using 
different software technologies. We also carried out other tasks in support of this simulated 
development work, such as configuration management, using still more products under 
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evaluation. This approach provided clear evidence of the suitability of one product over another 
and was much easier to derive than by only looking at a feature capability matrix. We had a high 
degree of confidence that the product would work on a real development project using this 
method. 

Dozens of tools have been evaluated using this method and still others are currently under 
examination. From this work many standard products have been chosen that are now part of 
AT&T’s overall body of internal technical standards. Through controlled introduction using pilot 
projects and consultative jump-starts many of these products have also proven to be successful 
on large-scale software projects. Recently this technique was also used successfully to evaluate 
over 30 software products used in Internet based development projects. 


6. PORTING THE PROCESS 

Deployment of this technique to a different environment requires minimal modifications. We have 
reused this process from the evaluation of Windows based tools to the evaluation of Internet 
based tools seamlessly. To transfer this process to a different development base or user 
community we recommend making the following changes: 

1. The tool taxonomy must be recalibrated to fit your environment and goals. Our 
taxonomy does not address databases, office automation, or operating systems. You 
need to add the appropriate technologies to fit you computing framework. 

2. Your architectural styles may vary from ours. We develop very few “hard” realtime 
systems or embedded systems of any kind since our spin-off of Lucent Technologies. 
There may be other significant architectural styles you will need to identify. 

3. After adjusting the framework and architectural styles you now need to document your 
screening criteria and create your detailed evaluation criteria templates. A good 
template typically requires a couple of days for an analyst to create. They are 
reusable and typically only one is necessary per technical category. 

4. Execute. This is the crucial step where the watch-word is “emulation”. That is, 
emulation of your actual development process and tasks. 

We are confident that by following these simple steps the process we have been using for the last 
two years can be re-deployed in any software development technology evaluation laboratory. 


7. CONCLUSIONS 

Using applications derived from clearly relevant architectures keeps the evaluation process 
honest. Analysts with development backgrounds typically feel more comfortable building an 
application than acting as a software critic. Simulating the development tasks in this way does not 
solve all the problems with technology evaluation. Politics and compromise are inescapable 
factors when making decisions that will commit a corporation to spending or not spending large 
sums with any given vendor. Also, some variability remains in the scoring technique. Each analyst 
tends to have peculiar habits in working through a 200 item feature matrix. One may score “high” 
or “low” while another may include “medium”. Nevertheless, we feel confident that architecture 
Styles add a healthy modicum of extra validity to the otherwise typical process we have described. 
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SOFTWARE TOOLS AND STANDARDS AT AT&T 


• Gartner Group estimates 40,000 software tools on the market 

• AT&T Has Hundreds of Projects Ongoing at any One Time 

• Training, Integration, Portability, Quality Drive Standards 



HOW WOULD YQU CHOOSE 
A FEW DOZEN TOOLS 
FOR CORPORATE WIDE 
DEVELOPMENT NEEDS? 
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EVALUATION APPROACHES REVIEWED 


Informational 

Case Study Reviews 



Questionnaires 


• RFI 



Vendor Demos 
Published Reviews 


Experimental 




Weighted Averaging 
Benchmarking 
Figures Of Merit 
Sample Applications 
Pilot Projects 


- Techniques we Favored 
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ARCHITECTURE STYLES INTRODUCED 


What is an Architecture Style? 

A set of operational characteristics common to a family of 
a software architecture and sufficient to identify that family. 

AT&T study yields four dominant styles : 

* Transaction 

* Data Streaming 

* Real Time 

* Decision Support 

* Most systems are Architectural Hybrids 
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MORE ON ARCHITECTURE STYLES 

USER INTERFACESTYf.ES 

• Forms 

• Documents 

• Active Graphics 

• Alert Panels (ie, mail program) 

• Maps 

• Hypertext 
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EVALUATION PROCESS SUMMARIZED 


t=> 


1) Survey the available products 

2) Classify according to a technical framework 

3) Filter the list using screening criteria 

4) Construct evaluation criteria templates 

5) Use tools to build Architecturally Representative Applications 

6) Record findings against the templates 

7) Judge the best scores and select the recommended product 
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SURVEY TECHNIQUES 


• Literature Reviews 

• Trade Shows and Technical Conferences 

• Direct Mail 

• Automated Topic Searches 

• Web Browsing 

• Vendor Demonstrations 

• Evaluation Copies 

• Private Contact Network 

• Experience 

• Project Reference 
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AN OVERALL TOOL FRAMEWORK 





Requirements 

Definition 


7k: 


Analysis & 
Design 


Implementation 


YL 


M 


Test 




M 


Release & 
Support 


CONTENT CREATION 


2 = 


£ 


System Documentation 


Software Configuration Management 


UtZ, 1992 


Copyright © 1 996 AT&T 


SELECTED TAXONOMY SUBCATEGORIES 


PROCESS/PLANNING/METRICS/REQ . 
Process Definition & Compliance 
Project Planning 
Function Points 
Genera! Metrics 
Requirements Trace 

ANALYSIS & DESIGN 

Object Oriented Analysis & Design 
Structured or Other Design Methods 
RDBMS Modeling 

IMPLEMENTATION 

Languages 

Editors 

Compilers & Debuggers 
IDEs 

GU I/Visual Development 
Cross Platform Development 
Database Development 
Components 


v&v 

Test Management 

Test Design & Generation 

Record & Playback 

Stress, Load, & Performance 

Coverage 

RELEASE&SUPPORT 

Distribution 
Reverse Engineering 
Emulation & Utilities 

DOCUMENTATION & WORKFLOW 

System Documentation 
Help Authoring 
Web Authoring 
Workflow 

SCM 

Source Code Control 
Defect Tracking 

Configuration and Manufacturing 
1996 AT&T 
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RESULTS: SCORING & RECOMMENDING 


• Recursive Development Efforts Yield Feature Scores 

• Simple Weighted Average Applied 

Score a = £ ( weightj * * score aj ) 

• Scores + Objective Side-by-Side Performance 

on Sample Application Determine Recommendation 
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AN EXAMPLE: RDBMS MODELING 


• Needed RDMBS Reverse Engineering & Modeling 

• Selected CONTACT Application 

- Reuse GUI Forms and DB created for earlier eval 

- Good Reverse Engineering candidate 

- Modify Schema and Rehost on new RDBMS 

• Many integration, support, and administrative problems 
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REQUIREMENTS & ARCH STYLE 


Table of Contents 

1. OVERVIEW 

2. CONTACT INTRODUCED 

3. JUSTIFICATION FOR CONTACT 

4. FEATURE REQUIREMENTS 

5. ARCHITECTURE OPTIONS 

6. DATA SCHEMA 

6.1 Information Model 

6.2 Database Schema and Tables 

6.3 Future Additions 

7. USER INTERFACE 

7.1 Command Buttons 

7.2 Combo Box 

7.3 Menus 

7.4 Tool Bar 

7.5 Icons 

8. USAGE SCENARIOS 

9. CONCLUSIONS & NEXT STEPS 

10. REFERENCES 



CONTACT Certifies: 
OLTP + DSS + Forms 
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IMPLEMENTATION RESULTS 



Where is stability and multiple database support? 


Analyst : I choose Access ODBC. If I use a name such as "customer name" (note the space) as a field 
name for an element in a record, FU get the message "invalid field name" while I generate schema. 
However, I can create a table with a field name "customer name" directly in Access. Is this a 
problem in TOOL-ABC? 


Vendor : I need to try MS Access Jet ODBC instead of using Access 2.0. 


Analyst : I choose Watcom 4.0. In database engine, after a schema generating, I want to change some 
of my records. When I choose a record and click "Edit", TOOL-ABC exits automatically and 
goes to the DOS prompt. After restarting Windows project is now “Exclusively locked”. 

Project cannot be deleted or renamed. 

Vendor : In order to recover my project, I was instructed to go to the project directory from Windows 
File Manager, then delete some files and copy other files, etc. No reason given for the problem. 
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EXAMPLE 2: HTML AUTHORING FOR DEVELOPMENT 


• Needed Update for HTML Authoring Recommendations 

• Created New Application: ToolBase 

- Existing forms based OLTP/DSS Application 

- Redesign for Hypertext Browser 

- Implement as interactive WWW DB app 

• Realized Need For Additional Modifications: 

- Originally built as simple UI 

- Re-fit with extensive images to test graphics toolkits 
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REQUIREMENTS & ARCH STYLE 



Contents 

Database schema 
Selection forms 
Report layouts 
Usage descriptions 
Table definitions 
HTML prototype 


ToolBase Certifies: 
OLTP + DSS + WWW 
Hypertext Interface 
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IMPLEMENTATION RESULTS 



No Support for Database Connectivity 
Poor Selection of GUI Widgets 


Limited Visual Alignment Capabilities wrt Req. 
Generally Poor support of Native HTML Editing 
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ASSESSMENT TOTALS PER CATEGORY 



TOOL TYPE COMPLETED 

EVALS 



• GUI Development 3 

• RDBMS Modeling 4 

•GIS 2 

• HTML Authoring 6 

• Java IDEs 4 

• Web Database Tools 2 

• Software Testing Tools 4 

• SCM 4 


' - I 
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BENEFITS & CHALLENGES 



Ties Evaluations to Actual Development Tasks 
Produces Representative Apps to Daisy-Chain Evals 
Supports Difficult Decision Making Task with Objective Data 
Creates Excellent Demonstrations for Consulting 



Does Not Provide Escape from Politics 

Process Requires Some Education for Each Participant 

Some Variability and Subjectivity Remains 

(especially in applying consistently and scoring techniques) 
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CONCLUSIONS 


Advising ourselves on starting over: 

• Establish dedicated lab space 

• Secure superb technical support 

• Rotate talent 

• Assure top-down management support 

• Expand internal communication efforts 

• Invite more vendor “bake-offs” (let them build it!) 

• Stay the course on architecturally relevant samples 
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Systematic Process Improvement 
in a 

Multi-site Software Development Project 


H. Hientz, G. Smith, A. Gustavsson, 
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Q-Labs Software Engineering GmbH, 
Germany 




Abstract This paper reports on the application of the PER- 
FECT 1 Improvement Approach and specifically goal-ori- 
ented measurement via GQM 2 in a laige multi-site 
software development project Successful and persistent 
implementation of an Experience Factory and the GQM 
approach in a large multi-site project organization is a 
challenging task and needs to be based on a sound and 
operational methodological support to face all the practical 
problems and resistance which occur in the course of a 
software process improvement programme. In the paper 
we present both measures and experiences of applying 
goal oriented measurement as well as experiences from 
introducing systematic process improvement based on 
measurement. 

Keywords: Systematic process improvement, software 
measurement. Goal Question Metric paradigm, Experience 
Factory approach 

1. Introduction 

When introducing persistent process improvement in an 
organization there is a need for having an underlying 
framework for what activities that need to be carried out in 
order to get lasting results. New results e.g. the Experience 
Factory [6] and GQM [2] points to the need of introducing: 

• explicit modelling of products, processes and quality 
aspects in order to understand the building blocks in 
software development and to be able to tailor them 
for specific needs, measure their adherence within 
different projects and to improve them across 
projects. 

• comprehensive reuse of models, knowledge and expe- 
rience in order to choose appropriate models for new 


1. Process Enhancement for Reduction of software 
deFECTs. 

2. Goal/Question/Metric 


projects and to compare actual project data with base- 
lines. 

• measurement integrated with the software development 
in order to define quality goals, understand differ- 
ences between projects and to control whether quality 
targets have been met 

In this paper we report on the experiences in establishing 
such a process improvement program using the PERFECT 
project [1] results as a methodology basis. PERFECT 
should be viewed as one possible instantiation of the Expe- 
rience Factory concept and provides a more detailed 
description of how to implement the process improvement 
framework. The organization described in this paper 
already promoted explicit modelling of products and pro- 
cesses. The next obvious implementation step was to inte- 
grate goal oriented measurement to create better 
understanding of the current baselines and thus in the sub- 
sequent projects better facilitate the future reuse of experi- 
ences and achieve improvements across projects and sites. 
This document describes the results of introducing goal 
oriented measurement and the first implementation steps in 
order to set up an experience factory. 

The paper is organized as follows. In chapter 2 we give the 
overview of the application project, its characteristics and 
organization. Chapter three introduces the process 
improvement framework that was used. Chapter 4 focuses 
on how goal oriented measurement was applied using the 
GQM approach [2]. The used method is described in detail 
together with examples from the collected measures and 
the analysis. Finally, in chapter 5 the conclusions are pre- 
sented. 

2. The Multi-site Improvement Project 

The target for the process improvement was a software 
development project of 350,000 m.hrs of effort over one 
year for the development of a new release of a product in 
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Ericsson’s GSM mobile telephony range. It was a collabo- 
rative development involving five separate design centres. 

It was the aim to carry out process improvement in a sys- 
tematic way rather than the ad-hoc approaches that usually 
characterize process improvement programmes. This 
means that: 

• a systematic model of process improvement was used 
to provide a framework for the programme, 

• the improvements were run within a separate improve- 
ment project with its own budget, plans, organization 
and reporting structure. This project ran ‘in parallel’ 
with the target software development project. 

What is meant by a “systematic approach to process 
improvement” is mainly the fact that the programme 
should be based on established models. In this project, 
there were several models underpinning the project: 

• a process improvement project structure 

• a process improvement organization 

• the ‘PERFECT’ Model of Process Improvement [1] 

• the GQM Approach for goal-oriented measurement [2] 

• an approach to technology transfer 

The main structure of the project is illustrated in figure 1, 
which shows: 

• the gathering of experiences from the previous project 
during the pre-execution phase 

• the use of the methodology framework from the ‘PER- 
FECT’ project, and the feedback of experiences 

• the methodology development activities providing 
improved methods and processes to the technology 


transfer part 

• the interaction with the target project at the five sites 
through the technology transfer activities 

• feedback from the target project 

• the analysis activity after target project termination 

It was necessary to set up an oiganization to carry through 
the improvements. To ensure that the process improvement 
programme maintained close contact with the design 
teams, process improvement teams (P.I.Ts) were set up in 
each site. They consisted of project members from that site 
and their role was to ensure a good two-way flow of infor- 
mation, ideas, and feedback between the process improve- 
ment programme and the design teams. The multi-site 
organization of the project consisted of: 

• A multi-site ‘Process Improvement Coordination 
Team’ (PICT) 

• Process Improvement Teams (PITs) in each site 

• Process improvement consultants (Q-Labs) 

The Process Improvement Coordination Team (P.I.C.T) 
was, as the name suggests, intended to coordinate and har- 
monize the activities across all sites in the project. 

This organization was deliberately ‘bottom-up’, i.e. the 
driving force behind the programme was intended to be the 
site PITs to ensure that the improvement proposals accu- 
rately reflected the real needs of the users. 


Figure 1 : The Multi-site project organization 
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the project characteristics. 


3* The PERFECT Process Improvement 
Framework 

The process improvement approach described in this paper 
is the result from the European ESPRIT project PERFECT 

[1] , especially the organizational structuring of the 
improvement project and the goal-oriented measurement 
parts. The PERFECT project had when it started in Sep- 
tember 1993 the goal to package for European industry 
methods and models for establishing measurement-based 
initiatives aimed at evolutionary improvement of software 
development processes relative to company-specific goals. 

The PERFECT Improvement approach is based on the 
technologies developed by Basili et al., the Quality 
Improvement Paradigm (QIP) on the methodological 
aspect, the Experience Factory (EF) for the organizational 
aspect and the Goal-Question-Metric (GQM) method for 
the goal oriented measurement activities, see for instance 

[2] , [5], [6]. These concepts have by the PERFECT project 
been detailed and enhanced with activities and packaged 
for use within the trial-applications of which PICME was 
one such. At the closure of the PERFECT project all devel- 
oped methodologies were packaged in booklets as deliver- 
ables. 

The PERFECT Improvement Approach Experience 
Factory Model (PEF) 

PEF is based on the existing material from SEL as well as 
not documented experiences. In addition to this we have 
used the industrial experience from european software 
industry to adapt and add on necessary areas. The usage 
within the PICME project gave together with the other 
applications many useful comments for updating and 
evolving the PEF when it comes to roles, responsibilities 
and activities. 

In the PEF model the EF (Experience Factory) is one part 
out of three, see Figure 2. The other parts are: the software 
development projects that is execution as case studies, sup- 
ported by the EF; and the sponsoring organization in which 
the projects as well as the EF resides. All three parts are 
equally important in establishing an effective improvement 
initiative. 

Experience Packages 

The PEF model is based on a focused and simplified model 
of an Experience Package. It includes three parts: process 
model, process control model (quality model), and process 
experience. The first part is what is traditionally handled 
by training and experts; the second is handling the GQM 
and measurement parts; while the third is focusing on the 
actual data, the conclusions and new hypothesis that can be 
drawn based on the process model, the measurements and 


The EF in PEF 

The EF in the PEF model consists of three focus parts, as 
can be seen in figure 2. One part handles the issues of the 
overall improvement work (the Strategic Improvement 
management); one handles the specific issues with each 
separate Software development project that are supported 
by the EF (the Project Support); and one handles all spe- 
cific project results that should be analysed and then gener- 
alized/synthesized for the whole organization (the 
Experience Package Engineering). 

Comparing the EF in the PEF with the NASA/SEL EF 

In relation to the NASA/SEL EF there has been put more 
emphasis on placing the EF into a context within an orga- 
nization. Especially the modular approach which empha- 
sizes the importance and clarifies the tasks of .the different 
areas both outside and inside the EF. 

From the outside and in following could be noticed: 

* The roles in the sponsoring organization that are neces- 
sary to establish an EF and the improvement initiative 
have been made explicit. The connection to the busi- 
ness goals and market situation, the internal organiza- 
tional development and the short term economical 
interest of the organization are described. 


Figure 2: The PERFECT Experience Factory, PEF 
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In the projects it is suggested that the project itself 
should take responsibility for the measurement col- 
lection and validation. Measurements must be an 
active part of managing each project. 

For the EF it is suggested that the overall issues of run- 
ning the EF also must be addressed more systemati- 
cally and goal oriented. That includes support for 
systematically selecting new technologies to experi- 
ment with and introduce, the actual running of an 
improvement programme in a goal driven manner 
and the handling of change on personnel and organi- 
zational level. 

The need for active support of each software project is 
also highlighted. This is one, often neglected, success 
factor in process improvement It is highlighted the 
need for different kind of support, i.e.: process train- 
ing and coaching; setting up efficient goal oriented 


Experiences from Experience Capturing from 
Perfect Application projects 

Since the PEF model evolved to its current shape from 
project feedback late in the PERFECT project the applica- 
tions did not have enough time for a full implementation. 
When mapping organisational entities and evaluating the 
activities in the project it is reassuring that activities in the 
PEF model are either already performed or there has been 
a need for introducing in the analysed application. Espe- 
cially promising was also comments like “the PEF model 
would have helped us organize the improvement initiative 
better” from one application provider. 

Identifying instances of the PEF within the PICME project 
the PICT could be viewed as the strategic improvement 
management force and the PITs as the project support 
functions of the PEF. 


Figure 4: GQM V-Model 



measurement programmes for the project; and sup- 
port in reusing (identifying, understanding and apply- 
ing) the experiences (conclusions and hypothesis) 
from previous projects. 

The third part of the EF in the PEF model is the one 
with direct similarities to the NASA/SEL EF. The 
distinction here is the structure of the activities, i.e., 
following the basic structure of the Experience Pack- 
age: Process Model, Process Control Model and Pro- 
cess Experience. 


In the PICME-project the PERFECT Improvement 
Approach, so far it had evolved, was used partly by apply- 
ing the steps of the QIP and extensive usage of the GQM 
approach. 
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back, 6) package experiences for future reuse. The mea- 
surement process varies depending on the purpose of the 
analysis goal, e.g., assessing a delivered software product 
versus evaluating a software development process, reusing 
GQM measurement goals and experiences versus execut- 
ing a measurement programme from scratch. 

The process steps are not followed in a waterfall like way, 
iterations should be considered, i.e., completion criteria 
must be defined and checked. A more formal description 
and experienced details could be found in [1], [9] provides 
lessons to be learned regarding measurement-based pro- 
cess improvement. 

The Measurement Goals 

The GQM analysis goals were prioritized according to the 
improvement goals of the organization, e.g., reduce time to 
market by 20%, and the catalog of improvement proposals 
targeted by the improvement programme. The GQM-based 
measurement goals were integrated into the existing corpo- 


Figure 3: High-level GQM Process Model 



L Data analysis GQM model 


4. Goal-oriented Measurement with GQM 
The Reference Model 

Conducting a measurement programme in a large 
multi-site organization has to be done following an explicit 
measurement process, using well defined measurement 
artifacts and involving a diversity of project staff from 
management to software development engineers. The 
GQM V-Model (Figure 4), as defined in and for this 
project, provides a reference model to illustrate, communi- 
cate and guide the measurement programme. It provides 
the ability to explain and trace the measurement approach 
followed (ad-hoc/bottom-up versus goal-ori- 
ented/top-down), the involved roles (from viewpoint of the 
analysis task to the data provider), required artifacts (anal- 
ysis goal to data collection sheets), and key activities such 
as refinement, verification and validation steps. 


The Measurement Process 

Having the GQM V-Model in place a high level measure- 
ment process is used to enact the measurement pro- 
gramme. The underlying measurement process (figure 3) 
consists basically of six steps: 1) characterization of envi- 
ronment, 2) set GQM measurement goals, 3) develop 
GQM models and produce measurement plan considering 
the reuse of existing experiences and models, 4) collect 
and validate data, 5) analyse data and provide project feed- 


rate wide measurement programme of assessing the 
projects performance in terms of Productivity, Quality, and 
Leadtime. The GQM goals were targeted on Inspection 
Efficiency, Teamwork Effectiveness, Work allocation Fit- 
ness, Stability of Requirements, and Applied Design Pro- 
cess Performance. All these goals were analysed in detail. 
But, due to the limited scope of this paper and confidential- 
ity reasons only the Teamwork Effectiveness could be 
reported throughout this paper. The relation between cor- 
porate and project concerns was captured in a GQM 
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‘goal-tree’ which is illustrated in Figure 5. The underlying 
software development process (here: prescriptive, water- 
fall like) must be respected which has a major impact on 
all the facets of the GQM goal, i.e., object of study, pur- 
pose, quality focus, viewpoint, and context Also the scope 
of the measurement goals should be constrained based on 
the resources dedicated to the measurement programme 
and the organization’s maturity. Maturity is defined in 
terms of stability of the processes in place and the ability to 
adhere to them. 


responsible roles for providing the data. Tools for data col- 
lection were either based directly on existing ones, e.g., 
time reporting system, paper/email-based questionnaires 
or enhanced existing tools, e.g., inspection record collec- 
tion tool. 

A simple spreadsheet application is sufficient to process all 
the collected data and aggregate them to the level of data 
analysis charts. Tool support should respect the principle 
of GQM, i.e., goal orientation. Currently web technology 
is being investigated, as part of the ‘engine room’ concept, 
which increases transparency and improves the access to 
FAQ’s, glossaries, instructions, etc. 



FigureS: GQM goal-tree 
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The GQM Models 

GQM abstraction sheets are useful for refining the GQM 
goals during interview sessions held with viewpoint repre- 
sentatives, affected project staff, and line representatives. 
The abstraction sheet for baselining the Teamwork Effec- 
tiveness is depicted in Figure 6. The derivation of the vari- 
ation factors (VF) is guided by categories of factors which 
are considered to have a main impact on the object of study 
([2]), e.g., domain conformance as VF 1 and VF 2, process 
conformance as VF 3 to VF 7. Likewise, the quality focus 
is defined based on the knowledge of the target environ- 
ment which is based on the viewpoint’s experience. 

The Data Collection 

Data collection is triggered by periodic activities, e.g., 
weekly time reporting, process states, e.g., begin/end of 
phases or entry/exit criteria, and artifact state transitions, 
e.g., inspected documents. The triggers determine the 


Data Analysis 

Data analysis was done without involving sophisticated 
statistical support. Nevertheless, validation of the variation 
hypothesis (figure 7) and a comparison of the actual data 
with the baseline hypothesis (figure 8) were performed in 
regular feedback sessions. 

To a limited extent the Rough Set approach was applied 
([4], chosen among other approaches, e.g., [7]) to analyse 
and package the measurement results. The Rough Sets 
Approach ([8]) is based on a learning by example theory, it 
has been used as a methodological tool for handling vague- 
ness, uncertainty and noise in the collected data. With 
respect to the Teamwork example used in this paper we 
identified the stability of the team composition (variation 
factor 6 in figure 6) as being the core attribute for explain- 
ing the performance of the teams with respect to the team 
spirit (quality focus attribute 3 in figure 6). 
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Figure 6: GQM Abstraction sheet - Teamwork Effectiveness Model 


Goal 

Object 

Purpose 

Quality Focus 

Viewpoint 

Context 

STG1.3 

teamwork 

baselining 



effectiveness 

PICT 

Project X 

Quality Focus 

Effectiveness of teamwork: 

1 . degree of compliance to team 
plans (effort, quality of team 
deliverables, adherence to 
internal teamwork processes) 

2. spreading of competence 

3. team spirit 

Impact on Quality Focus (variation factors) 

X. previous teamwork experience 

2. suitability of defined process for teamwork 

3. team size 

4. % of time devoted to teamwork 

-5. balance of competence within the team 

6. stability of team composition 

7. degree of freedom of team to develop own plans 

Baseline Hypothesis 51 

1. current degree of compliance 
to team plans «? 

2. current spread of compe- 
tence *? 

3. current level of spirit”? 

Impact on Baseline Hypothesis 

1. a lot of experience in working in teams increases the team effectiveness 

2. inappropriate practices and processes reduces effectiveness of teamwork 

3. inappropriate team size decreases effectiveness of teamwork 

4. reduced time devoted for teamwork reduce team effectiveness 

5. a good mix of skills is necessary for effective teamwork and to spread com- 
petence within the organization 

6. frequent changes of membership in the team reduce team effectiveness 

7. empowering teams to do their own planning increases team commitment to 
those plans, which in turn increases the chances of compliance to the plans 


a. The actual values were unknown, therefore the assumed values were stated and validated as shown in 
figure 8. 


Main constraints during the analysis task were 

• the lack of an underlying descriptive software process 
model, leading to uncertainty in the reliability of data 
collected, 

• the strict goal orientation during analysis, and 

• the inherent characteristics of software engineering 
data in general ([3]). 

The main purpose of this measurement programme was 
‘baselining’. This implies less importance related to the 
hypothesis validation of the variation factors and focus 
more on the validation of the baseline hypothesis (figure 
8). But, because the purpose will change to ‘control’, the 
GQM models will evolve and validation will become a key 
issue. The ultimate goal for measurement must be 
‘improvement’. 


Improvement Opportunities 

Three main sources for improvements of the software 
development process could be identified through 

• the analysis results from the measurement goals, i.e., 
quantitative understanding, 

• the GQM modelling task itself, i.e., qualitative under- 
standing, and 

• the enactment of goal-oriented measurement pro- 
gramme, i.e., analysis and trace of execution prob- 
lems. 

They uncover problems with the actual software develop- 
ment process, the software products delivered and the 
management of the software projects. 
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Figure 7: Validation of Variation hypothesis 
Teamwork Effectiveness Model 
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Conclusion 


cases in a slightly modified form. 

The organization set up to pursue the improvement pro- 
gramme is continuing although with a slightly modified com- 
position, and is gradually starting to assume the role of 
‘keeper of the process experience base’ and ‘Strategic 
Improvement Management* [1] group. Although the PER- 
FECT Approach for Improving Software Processes [1] pro- 
vided a conceptual framework and useful reference model, 
we had difficulty in really putting it into practice. Although 
the underlying ideas were well-established, the practical 
details of the method were still evolving during the time of 
this project and practical experience of their use were not 
available. Real-world examples are needed for guidance. 

The emphasis on process coaching to support technology 
transfer and as a way of raising process adherence has raised 
awareness of process issues in the organization, even if it has 
not yet resulted in a noticeable increase in process adherence. 

The main emphasis in the project was on the application of 
goal-oriented measurement and the creation of a quantitative 
process baseline. The project was largely successful in both 
of these areas and the same GQM models are continuing to 
be used in two follow-up projects with slight modification. 
The main lesson learned from this first round of measure- 
ment is the need to start small and build up as the organiza- 
tion’s measurement maturity grows. Despite having known 
this at the start, we still ended up with a measurement plan 
that was too ambitious and severely taxed the ability of the 
sites to collect and report data accurately and in a timely 
fashion. This was perhaps an inevitable consequence of the 
global project-wide scope of the measurement programme. 
Current measurement programmes are being more narrowly 
focussed on specific process areas. The area where we have 
had to be most innovative is analysis and interpretation of the 
results and presentation of these to project personnel. Mean- 
ingful presentation models are needed to reveal trends in the 
data and impacts between the variation factors and the qual- 
ity focus. The two diagram types for validation of the ‘Varia- 
tion hypothesis* and ‘Baseline hypothesis’ worked well but 
more needs to be done. The use of Web technology to dis- 
seminate results will help in motivating the measurement 
activities and in the feedback of results. Finally, even in this 
measurement round, some useful insights have been gained 
into aspects of the process that were not previously under- 
stood and this has led to corrective actions in subsequent 
projects and this is ultimately what justifies continuation of 
the investment in measurement 


In this project we have attempted to apply systematic pro- 
cess improvement in a large multi-site software develop- 
ment. The results have been mixed but more positive than 
negative. The final proof being that most of the innova- 
tions are continuing in subsequent projects, albeit in some 
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Figure 8: Validation of Baseline hypothesis - Teamwork Effectiveness Model 


high 


normal H H 


low 


Quality Focus 



90% 80% 5 5 5 


90.f/o 


103% 4 4 


4.3 


70k 12%% 3 

6i.m 


© 


2 2 2 
1175% 


1.1 1.2 1.3 2 

1 . Compliance to team plans 

1.1 weekly document status % 

1.2 effort % 

1.3 team process adherence 

2. Competence spread 

3. Team spirit 



1. Teamwork experience 5. Competence balance 

2. Suitability of team process 6 . Composition stability 

3. Team size 

4. Teamwork effort % 


References 

1 PERFECT Consortium, D22A. ’The PERFECT Approach 
for Improving Software Processes’. ESPRIT Project No. 9090, 
1994. 

2 V. Basili and H. D. Rombach, “The TAME Project; Towards 
Improvement-Oriented Software Environments”, IEEE Transac- 
tions on Software Engineering, 14 (6), pages 758-773, June 1988. 

3 L. Briand, V. Basili, and W. Thomas, “A pattern recognition 
approach for software engineering data analysis”, IEEE Trans. 
Software Eng., vol. 18, no. 1 1, Nov. 1992 

4 Q-Labs, “How could Rough Sets be used for GQM-based 
Data analysis”, KL/QLS 96:0354, October 1996 (unpublished). 

5 Victor R. Basili and H. Dieter Rombach. TAME: Integrating 
measurement into software environments. Technical Report 
CS-TR-1764 and TAME-TR-1-1987, Department of Computer 
Science, University of Maryland, College Park, MD 20742, June 
1987. 


6 Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach. 
Experience Factory. In John J. Marciniak, editor. Encyclopedia of 
Software Engineering , volume 1, pages 469-476. John Wiley & 
Sons, 1994. 

7 L. Briand, V. Basili, and W. Thomas, A pattern recognition 
approach for software engineering data analysis, IEEE Trans. 
Software Eng., vol. 18, no. 11, Nov. 1992. 

8 J.W. Grzymala-Busse, LERS - A system for learning from 
examples based on rough sets. In. R. Slowinsky, editor. Intelligent 
Decision Support: Handbook of Applications and Advances of 
Rough Sets Theory , Kluwer Academic PubI, 1992. 

9 Lionel Briand, Cbristiane Differding, and H. Dieter Rom- 
bach, Practical Guidelines for Measurement-Based Process Im- 
provement, Published as Technical Report for the International 
Software Engineering Research Network (ISERN-96-05), 1996. 


SEW Proceedings io7 


SEL-96-002 




SEW Proceedings 


108 


SEL-96-002 



KUQLS 96:0534 C 96-1 203 


PICME &-PERFEGT 


PiCME 

& 

PERFECT Experience Factory 
Horst Hientz 

NASA/SEL December 4-5, 1996 


Q-Labs Software Engineering GmbH 
Technopark \ 
Kaiserslautern, Germany 


Q-Labs 


Dooimentno 


Horst Hientz 


PICME & PERFECT 


Presentation Overview 

1 PICME - The Improvement Project Structure 

2 PERFECT - The Experience Factory Approach 

3 GQM - The Goal-oriented Measurement Approach 

4 Results from the PICME Project 

5 Lessons Learned from the PICME Project 


Q-Labs 


Horst Hientz 


SEW Proceedings 


109 


SEL-96-002 






SEW Proceedings 


110 


SEL-96-002 




KL/QLS 96:0534 C 96-12-03 KL/QLS 96:0534 C 96-12-03 




SEW Proceedings 


111 


SEL-96-002 








KUQLS 96:0534 C 96-12*03 KUQLS 96:0534 C 96-12-03 




SEW Proceedings 


113 


SEL-96-002 




KUQLS 96:0534 C 86-12-03 KUQIS 98:0534 C 98-12-03 


PICME & PERFECT 


i 


Results from the PICME Project 

Costs 

® PI project cost 2.9% of target project 
® GQM cost 20% of PI project (i.e. t 0.006% of target) 

Achievements 

® GQM measurement programme institutionalized: 

GQM Models, GQM responsible measurement process owner 

a Quantitative baseline (Inspections, Teamwork, Design process 

performance and their impact on Productivity, Quality, and Leadiime) 

° Experience Factory entities institutionalized, e.g.. Strategic 
Improvement Management organization 
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PICME & PERFECT 


Lessons learned from the PICME Project 

Process Improvement approach 
® Technology transfer and Coaching are crucial 
° Goal-oriented measurement is a prerequisite 
° Must be done in a systematic continuous way (PERFECT Model) 
Goal-oriented measurement with GQM 
e Ambitiously high-level goals 
® Measurement cycles too long for start 
® Measurement plan to ambiguous 
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An Empirical Study of Process Conformance 


Sivert S0rumg§id 1 

Norwegian University of Science and Technology 
3 December 1996 



Abstract 




An experiment was carried oat at the Norwegian University of Science and Technology 
(NTNU) in onto to investigate a concept called process conformance. The experiment was 
based on previous experiments earned out at the Software Engineering Laboratory (SEL) 
(Basili, 1996). The purpose of the experiment was to compare two variants of a process and 
see whether the type of process had an impact on the level of process conformance. Another 
goal was to investigate the correlation between the degree of conformance and deviation in 
product quality. The results obtained so far indicate some evidence that process conformance 
and product quality deviation are correlated. The process type bad no apparent impact on con- 
formance. 



1.0 Introduction 

Experiences from an experiment carried out in the context of SEL at the University of Mary- 
land (UMD) in 1995 suggested that one problem concerning experiments with software engi- 
neering processes is the question of whether the process under investigation is actually used by 
the subjects in the experiment ~ i.e. process conformance (Basili, 1996). Hence, we define pro- 
cess conformance as: 

The degree of agreement between a process definition and the process that is 
actually carried out. 

As we consider the definition above, three problems immediately arise: How to measure the 
degree of agreement, to define in more detail what agreement means, and finally what is meant 
by a process definition . Here, we will put emphasis on the first problem - how to measure pro- 
cess conformance. 

Some related work has been proposed (e.g. Cook, 1994; Cugola, 1995; Miyazaki, 1987; etc.), 
but not in the domain of software process experiments, which tend to study low-level and 
thought-intensive processes. Thus, the current approaches, which axe mostly focused on 
higher-level processes, were not considered appropriate. 

This paper is describing an experiment that was carried out to investigate process conformance. 
First, the context of the experiment, the goal and hypotheses, and its design axe described. 
Then the conformance measurement is explained. Further, the required preparations and the 
execution of the experiment are outlined briefly, before the experimental results are presented. 
Finally, a conclusion summarizes the experiment and presents some ideas about the possible 
future direction of this work. 


1. Complete address: Division of Comparer Science, Norwegian University of Science and Technology, N-7034 
Trondheim, Norway 

Email: sivnt@idLntna.no, phone: +47 73 59 44 79, fax: +47 73 59 44 66. 
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2.0 Context of the Experiment 

The goal of the experiment was to investigate process conformance. In particular, we wanted to 
compare two variants of the same process where one variant had been modified in order to 
make it more explicit as well as requiring that the subjects delivered intermediate results. Mod- 
ifications such as these result in a process which is defined in mote detail, and thus may be 
expected to be easier to follow correctly since the room for interpretetaion is reduced. The pro- 
cess we used was Perspective-Based Reading (Basili, 1996) as applied in the UMD experiment 
referred to earlier (Basili, 1996). 

PBR is a technique for reading requirements specifications in order to find defects. The idea is 
that people read it from three different perspectives: Design, Use, and Test. In our experiment, 
we applied only die design perspective in order to reduce the number of variables. 


Form E6d - Reading Experiment/Reading Scenario 

Perspective-based Reading 

Perspective based reading is die concept that the various customers of a product 
should read a document in such a way as to find out if the document satisfies their 
needs for it In doing so it is hoped that the reader will find defects and be able to 
asses the document from their particular point of view. 

Design-based Reading 

Generate a design of the system from which die system can be implemented. Use 
vour standard design approach and technique, and incorporate all necessary data 
objects, data structures and functions. 

In doing so, ask yourself the following questions throughout the design: 

1. Are all the necessary objects (data, data structures, and functions) 
defined? 

2. Are all the interfaces specified and consistent? 

3. Can all data types be defined (e.g., are die required precision and units 
specified)? 

4. Is all die necessary information available to do die design? Are all the 
conditions involving all objects specified (e.g., are any requirements/ 
functional specifications missing)? 

5. Are there any points in which you are not clear about what you should 
do because the requirement/functional specification is not dear or not 
consistent? 

6. Does the requirement/fimctional specification make sense from what 
you know about the application or from what is specified in the general 
description/introduction? 


Figure 1. Process description - PBR. 
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The design perspective, as described by PBR, is characterized by a short description and a set 
of questions. The “designer” is to apply a design technique and make a design for the system, 
and during this process he is to apply the questions in order to identify defects. However, the 
description puts forward no requirements as to which design technique is to be applied. The 
modified version of PBR was made more explicit by requiring a specific design technique 
called OOram (Object Oriented Role Analysis Method) (Reenskaug, 1995) to be applied. 
Another modification was to require the subjects to deliver their design as an intermediate 
result of the process. Hence, variation in process execution could be assumed to be reduced. 
The process descriptions for the unmodified and modified versions are provided in Figure 1 
and Figure 2 respectively. 


Form E6d - Reading Experiment/Reading Scenario 

Perspective-based Reading 

Perspective based reading is the concept that the various customers of a product 
should read a document in such a way as to find out if die document satisfies their 
needs for it In doing so it is hoped that the reader will find defects and be able to 
asses the document from their particular point of view. 

Design-based Reading 


Generate a design of the system from which the system can be implemented. Use 
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In doing so, ask yourself the following questions throughout the design: 

1. Are all the necessary objects (data, data structures, and functions) 
defined? 

2. Are all the interfaces specified and consistent? 

3. Can all data types be defined (e.g., are the required precision and units 
specified)? 

4. Is all the necessary information available to do the design? Are all the 
conditions involving all objects specified (e.g., are any requirements/ 
functional specifications missing)? 

5. Are there any points in which you are not clear about what you should 
do because the requirement/functional specification is not clear or not 
consistent? 

6. Does the requirement/functional specification make sense from what 
you know about the application or from what is specified in the general 
description/introduction? 
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2.1 Hypotheses 

The experiment was focused on comparing and measuring the degree of conformance. Thus 
we assume that the two process variants will be different as far as the conformance is con- 
coned. This means, we assume that people tend to follow one process variant more closely 
than the other. Based on this, the hypothesis and its associated null hypothesis for this experi- 
ment was: 

HI Subjects applying the modified version of PBR will show a higher degree of 
process conformance than subjects applying the unmodified version of PBR. 

H0,1 There is no difference in process conformance between subjects applying the 
modified version of PBR and subjects applying the unmodified version of 
PBR. 

There are a number of additional hypotheses that are also of high interest in die context of pro- 
cess conformance. In this paper, we will also consider the following hypothesis and associated 
null hypothesis: 

H2 There is no correlation between process conformance and deviation in prod- 
uct quality. 

H0,2 Process conformance and deviation in product quality are associated vari- 
ables. 

In the following discussion, the modified version of Perspective-Based Reading will be 
referred to as MPBR, while the unmodified version is labelled PBR. 

2.2 Measuring Process Conformance 

In order to test our hypotheses, we need a way of measuring process conformance and devia- 
tion in product quality. One way of doing this would be to observe how the process was carried 
out and then compare these observations with the process description. This can be accom- 
plished by collecting a number of observations for each subject’s process execution, e.g. time 
used, and product size and quality, and compare these observations with predicted values. Or, 
alternatively, the sample means may be substituted for the predicted values, if we assume that 
the average observations represent a typical process execution. 

Based on the set of observations for each subject, we can construct a deviation vector, which is 
a model of how the process execution diverges from the expected performance. A deviation 
vector with two dimensions, time and quality, is depicted in Figure 2. Here, the predicted exe- 
cution of process i is represented by a vector P it while the actual execution is represented by 
the vector 2J f . The deviation vector is now defined as the difference between the predicted and 
actual execution, where the value in each dimension is the unsigned difference between predic- 
tion and execution. Thus, the deviation vector in Figure 2 becomes [ I - T I)P 1, 1 Q ije - Q ij} \ ]. 

The conformance measurement can now be defined as the length of this vector when all the 
dimensions have been normalized by dividing the difference by the expected value so that the 
different dimensions can be combined. In this experiment, we used the observations time, 
product size , and product quality (these will be explained later) in the deviation vector, and 
thereby obtained die conformance measurement for subject i given by Equation 1, where the 
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Figure 2. The positive deviation vector. 

predicted values are replaced by the sample averages. This measurement was used to test 
hypothesis H0,1. 



J_ /f K-- £(f) h 2 , 7K- £ ^>l f 

JIHV m J E(s) J + v E(q) ) 


( 1 ) 


For testing the second hypothesis, we also need a measure of product quality deviation. How- 
ever, this is exactly the third dimension in, the vector above, i.e. product quality deviation for 
subject i is defined by 


dQi = 


E(q) 


( 2 ) 


Testing the association between the quality deviation given by Equation 2 and the process con- 
formance measurement given by Equation 1 is not reasonable to do since quality deviation is 
also a component in process conformance. Thus, a simplified conformance measurement is 
required for testing the second null hypothesis H0,2. In this simplified measurement, we used 
only the two dimensions time and size, as given by the equation below. 


■ - 1 

1 JlHV E(t) J + l E(s) ) 


(3) 


23 Experimental Design 

We used a fractional factorial design where we blocked the subjects on document order and 
technique type (these variables will be explained later), thus obtaining the design illustrated in 
Figure 3 where the actual number of subjects in each block is indicated in parenthesis. The 
subjects in the experiment were 48 graduate students in their last year of study before the 
diploma thesis. The number of subjects in each block indicated in the figure above are slightly 
uneven because some of the subjects that signed up for the experiment did not show up. Every 
subject read two software requirements specifications that were seeded with a set of known 
defects, and applied a specific technique in order to find the defects, using the same technique 
for both documents. Thus, there were three independent variables as described in Table 1 
below. 
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Technique 

PBR 

Modified PBR 

Document 

order 

ATM-PG 

PG-ATM 

ATM-PG 

PG-ATM 

Socument 

ATM 

PG 

AIM 

PG 

Second . 
document 

PG 

ATM 

PG 

ATM 

EsSSf 0 ' 

13 (12) 

12(11) 

13 (12) 

13 (13) 


Group 1 Group 2 Group 3 Group 4 


Figure 3. Design of the experiment 


Variable 

Scale Unit 

Comments 

Technique 

nominal 

Technique has two values: PBR and MPBR. 

Document 

nominal 

Document can take two values: ATM and PG. 

Order 

nominal 

Order has two values: ATM-PG and PG-ATM. 


Table 1. Independent variables. 


The dependent variables which were collected are summarized in Table 2. The basic variables 
are time, defects, and size. The two latter had to be adjusted for difference in document “size” 
(size in terms of the number of seeded defects). Thus, they were replaced by rates. In addition 
to the variables described in Table 2, a simplified variant of the conformance measurement and 
the measurement of product quality deviation, as given by Equation 2 and 3, were also needed, 
as discussed previously. 


Variable 

Scale 

Unit 

Comments 

Time 

ratio 

minute 

Time was measured by the subjects them- 
selves. 

Defects 

absolute 

- 

The number of defects the subject identified 
which was also on our list of known defects. 

Size 

absolute 

- 

Total number of potential defects identified by 
the subject 

Defect detection 
rate 

ratio 


Number of real defects identified divided by 
the total number of defects in the document. 

Adjusted size 

ratio 

- 

Number of assumed defects (size) divided by 
the total number of defects in the document 

Conformance 

ordinal 

- 

Conformance measured using time, defect 
detection rate, and size as parameters. 


Table 2. Dependent variables. 
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2.4 Preparation and Execution of the Experiment 

This experiment was based on the UMD experiment referred to earlier, and much of the exper- 
imental material, documents and forms were reused, in addition to the process of Perspective- 
Based Reading as explained earlier. The two documents that were read by the subjects were 
completely unchanged from die UMD versions, and were: 

• A specification for an automated teller machine network, called the ATM document 
The latest version available as of 19th April 1996 was used. The document was 16 
pages long and contained 29 seeded defects. 

• A specification for a parking garage control system, called the PG document Again, 
the latest version available at the time of the experiment was used. This document was 
17 pages long and contained 27 seeded defects. 

The defect lists applied were also the same as in the UMD experiment As for the forms 
applied, we only used one type of form for all subjects regardless of process type. We could do 
this because we only applied one perspective, while the UMD experiment investigated all the 
three perspectives of PBR. 

Since there was no pretest of the subjects, they were assigned to the blocks randomly. The sub- 
jects were split into two separate groups when they received orientation and training in front of 
the experiment They were not told about the hypothesis or about the differences in the pro- 
cesses. All subjects received the same type and amount of training. After the training session, 
which was one hour for each of the two groups, the subjects had one week to read the docu- 
ments and marie the defects. However, they were instructed not to use more than 1:45 hours on 
each document When the subjects had read both documents, they returned diem to us for scor- 
ing. Two persons scored the documents independently, and then resolved any conflicts by dis- 
cussing each disagreement 

Finally, we removed the outliers from the data set. First three subjects that failed to show up 
were removed. Next those reporting no effort spent i-e. no time used to find defects, were 
removed. Finally, subjects having found no defects, even though they reported some effort 
spent were removed if their documents showed no clear signs of being read. 


3.0 Results of the Experiment 

In order to test the first hypothesis, i.e. whether the process modifications caused improved 
process conformance, the eight samples were first compared all at once to determine if they 
could all be assumed to come from the same population. Here, the Kruskal-Wallis test was 
used (Siegel, 1988). The test indicated that the null hypothesis could not be rejected (p=0.5), 
meaning that there were no significant differences between any of the eight samples, and thus 
technique could not be considered to have any effect on process conformance. This was ctm- 
firmed by grouping subjects using the same technique into two samples and test the difference 
by using the Wilcoxon-Mann-Whitney test (Siegel, 1988). The null hypothesis for this test was 
that the two samples woe drawn from die same population, while the alternative hypothesis 
was that the sample using MPBR scored lower on the conformance measurement The null 
hypothesis was not rejected (p=0.39). The medians for process conformance for the eight sam- 
ples are shown in the chart in Figure 4, and illustrate the similarity between the samples. 
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Figured Effect of process modifications on conformance median. 


The second hypothesis was focused on the association between the simplified measure of pro- 
cess conformance, as given by Equation 3, and deviation in product quality, as measured by 
Equation 2. The assumption was that subjects who were not following die process correcdy, as 
indicated by a high deviation value, would not deliver a product that was close to the average 
of the sample. This is the principle which many process improvement approaches are based on, 
that by reducing the variance in the process execution, a more stable process performance is 
ensured. 

To test this hypothesis, the Spearman rank-order correlation coefficient (Siegel, 1988) was 
computed and used to decide whether the null hypothesis, which suggests that the two samples 
are not correlated, could be rejected. The rejection was confirmed by the test (p=0.0010), 
meaning that with a significance level of a=0.05, the two variables can be considered signifi- 
cantly associated. The two variables are plotted in Figure 5. 



Figures. Association between conformance and quality deviation. 
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4.0 Conclusion 


Based on the preceding tests, we can conclude that the suggested modifications assumed to 
increase the level of process conformance had no effect However, a significant level of corre- 
lation was detected between process conformance, as measured by the simplified measurement 
given by Equation 3, and the deviation in product quality, as given by Equation 2. 

The problem with this experiment is that the subjects used, since being students, can not be 
considered representative for the population of professional programmers. Especially consider - 
ing that the experiment was earned out as a compulsory assignment The consequence of die 
experimental situation could be that subjects being assigned the modified version of PBR 
developed reactive effects due to the presumably high work-load of also delivering an interme- 
diate product Thus, we have a potential interaction effect between the treatment and the sam- 
ple, combined with possible reactive effects due to the experimental environment meaning that 
external validity may be compromized. 

However, in the case of the association between process conformance and deviation in product 
quality, the threat might be less relevant since we are essentially comparing two kinds of devi- 
ations. However, whether the two variables are significantly correlated also in other popula- 
tions and environments can only be determined empirically. 

This paper approached process conformance from an experimental point of view - i.e. we con- 
sidered lack of conformance a problem in software process experiments, however, this is a 
problem also in other contexts. One of the major problems in software development is lack of 
predictability - this problem may be reduced by achieving a more stable product quality 
through controlling process conformance. Proper process conformance is also necessaru to 
reuse experiences effectively both within one organization as well as in different organizations. 
Thus, process conformance may be considered an important aspect of process quality. 

In the experiment described here, we attempted to influence process conformance by modify- 
ing the process. However, we can imagine various other ways of influencing conformance, e.g 
by education and training, or by control and enforcement The way of improving process con - 
formance must be related to the context in each case. Different ways may be beneficial in e.g. 
an experiment context than in a development context 
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Experimental context is assumed. 
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Conformance in Experiments 



Are the processes carried out the way we think? 


The Deviation Vector 


• Observations indicate what’s important. 

• Use as dimensions in a vector - parametrized model. 



• Deviation is difference between Execution and Prediction. 

• Rules for combining task deviations to obtain process deviation. 

The deviation vector is a model of conformance. 
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Measuring Conformance 


• Define measurement based on deviation vector. 

• Differences as fractions, independent of scale. 



• Then, dimensions in the deviation vector may be compared. 

• Can define measure of process conformance. 

Process conformance: Length of deviation vector. 


Process Modifications 


Enabling effective measurement 

• Observations reflecting process characteristics. 

• Intermediate products. 

Improving conformance 

• Remove ambiguities and reduce room for interpretation. 

• Suggest process steps. 

• Explicit and specific. 

• Training, teaching, representation. 


Can the process be modified to become conform? 
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Experimental Study 


• What to compare with? 

• Validity vs. usability. 

• What are the relations in the empirical system? 

• Two aspects: Modifications, and conformance measurement. 



Fractional factorial design, students as subjects. 


Variables and Hypotheses 


Variables 

• Technique, document order, and document type. 

• Measurements: Time, defect detection rate (“quality”), total 
number of defects found (“size”), intermediate product quality. 

Hypotheses 

• The modifications improve process conformance. 

• The modifications lead to reduced product quality. 

• The modifications lead to reduced product quality deviation. 

• Conformance is associated with deviation in product quality. 


Improved conformance? 
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Effect on Conformance 



ATM-PG, ATM-PG, PG-ATM, PG-ATM, Order, 
ATM pg ATM pg Document 


No significant difference. 


Effect on Defect Detection Rate 



ATM-PG, ATM-PG, PG-ATM, PG-ATM, Order, 
ATM pg ATM PG Document 


Both significantly worse on second document. 
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Conformance 


Effect on Deviation in DDR 



No significant difference 


DDR Deviation vs. Conformance 



Observation 


Significant association. 
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Conformance in Experiments 


Pros 

• Reduced variability in process - improved statistical validity. 

• Ensure/measure conformance - improved construct validity. 

Cons 

• Process modifications may be necessary. 

• Temporary or permanent modifications? 

• Interaction effects with technique type. 

• Conformance at a lower level - where to stop? 


Useful when obtaining 


knowledge? 


Conformance in Software Development 


Pros 

• Reduced variance in product characteristics - better control. 

• Improved predictability. 

• Ensure valid process-related knowledge. 

Cons 

• Sensitive data collected. 

• Reactive effects - data could be misused. 

• Bureaucracy - administrative overhead. 

• Reduced performance. 


Useful when applying knowledge. 
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Findings 

• Quality deviation and process conformance are correlated. 

• No significant effect from modifications. 

• Need to test validity further. 

Applicability 

• Conformance may be useful for 

- Experiments. 

- Process improvement. 

- Situations involving knowledge transfer. 


Some benefit, but further investigation needed. 
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Abstract. Large corporations are attempting to cut development costs by relying on 
integrating several COTS to achieve partial or complete business solutions. That the 
benefits are not turning out as expected is slowly becoming a recognized issue. In this 
paper, we address some of the reasons why integration of COTS is a challenge. 


Introduction 

As corporations move from monolithic single technology systems to large hybrid distributed systems based 
on multiple technologies, it is becoming increasingly important to understand how these technologies can 
work together. Often, the desire to integrate two technologies arises from a realization that two very 
different programs, when combined together, should provide exactly the functionality required for a 
business need. This recognition usually results in what is termed an integration effort rather than a 
development effort. Shortened deadlines and reduced funding result - a practice justified by that fact that so 
much of the functionality already exists. After all, why should we pay money to write a complete solution 
from scratch when we can purchase two relatively inexpensive products (each of which provides half the 
functionality) and link them together. How tough can it be? How long can it take? How much can it cost? 

It does not take much practical experience before one realizes that a tremendous amount of development 
effort is required to combine two programs (or program fragments) to meet a specific business need. The 
projected time and cost savings do not manifest themselves. Answers as to why costs were so high often 
sound like lame excuses - we had to write adapters to get them to work together, this function didn’t work 
with that function like we expected, and we had to write additional code to meet some of the requirements 
that weren’t met by either product, the purchased product didn’t have the specific labels used by our 
organization and we had to rewrite them, etc. ad nausea. 

The feeling that these are lame excuses does not negate one important point. Real work has been performed 
to make the integration functional. This is a major source of management dissatisfaction with integration 
efforts. Why should integrating two products require almost as much work as building a system from the 
ground up? 

This paper introduces a technology topology as a tool for understanding COTS integration issues. It 
explores the issues by demonstrating the dissonance between two technologies and the extra effort required 
to get them working together. 

A Tool For Understanding 

In early 1995, we started looking at how diverse Object-Oriented concepts worked together to assist in the 
development of large-scale systems. In particular, we wanted to know how we could use our knowledge in 
areas of domain modeling, architectural styles, frameworks, kits, and object-design patterns to ease and 
stream-line the development of a system. The result of this effort we termed an Object Topology in the 
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Abstraction Of Representation 

Figure 1. Technology Topology and representative technology components. 

sense that it provided a road map of how these technologies worked together. Our result was documented in 
a previous paper [Tepfenhart]. 

Since that time, we have extended our investigations to include other development paradigms including 
conventional procedural programming, relational databases, artificial intelligence, and web technology. In 
the course of our investigations it became very clear that we were dealing with very diverse technologies, 
each of which has its own vocabulary and supports software development in very different ways. It turned 
out that each technology has its’ own topology - a different road map for getting from requirements to 
working system. 

Technology Topology 

This section of this paper introduces the concept of a technology topology. A topology is a description of 
the properties of a surface. A road map is one example of a topology. Road maps use longitudes and 
latitudes as the basis for organizing the points on the map. Longitude and latitude are the coordinates for the 
topology. 

A technology topology is a road map for using a software programming technology. In a technology 
topology, we use the abstraction of the representation and the application domain dependency as the two 
coordinates by which we organize points on the map. Elements of a technology that use pictures and/or 
natural language are said to have a very abstract representation. Elements in which the representation can 
actually be executed are said to be very concrete. Machine code is very concrete, source code is moderately 
concrete, a design diagram is moderately abstract, and a requirements specification is very abstract. An 
element of a technology that is expressed in terms that have little to do with the application domain is said 
to be application domain independent. Conversely, an element of a technology that is expressed entirely in 
terms of the application domain is said to be application domain dependent. 

We can create a matrix of different elements of programming in terms of their general location in a 
technology topology. In the paragraphs that follow we will explore where the different elements lie on the 
generalized topology. Figure 1 shows the completed topology and representative technology components in 
terms of their approximate placement on the topology. However, different technologies have elements that 
lie in slightly different points on the topology. This will be shown in the next major section. 

A system specification is highly application domain dependent and very abstract. This is because a system 
specification is typically expressed in natural language and deals with application domain concepts. The 
system specification for a billing system is necessarily expressed in terms of words associated with billing 
concepts. A good system specification is essentially technology independent — one is interested in what the 
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system will do, not if it is implemented using 0-0 or relational approaches. In practice, system 
specifications are often expressed in technological terms. 

A domain model is highly abstract and very domain dependent. It is highly abstract because graphical 
representations are often employed. It is moderately domain dependent because the design presents a 
structure that captures application domain concepts. In most technologies, the domain model forms the basis 
of a design. Such a design for a billing system would have components named print_bill where bill is a 
domain dependent term. 

An architecture is highly abstract and moderately application domain independent. The components of an 
architecture are elements which fit specific architectural styles. The representation is abstract since 
architectures are usually captured in a diagram. Architectures and architectural styles are application 
domain independent because they deal with things like platforms, files, processes, and protocols. None of 
these are described in terms of application domain elements. 

An implementation of an architectural component is a framework. These elements are typically COTS 
systems. A framework is moderately to highly concrete and highly application domain independent. For 
example, a client-server architecture which has PowerBuilder and Sybase components is highly concrete 
and very domain independent. PowerBuilder and Sybase have no concepts built into them of any particular 
domain. The power of such COTS architectural components is that they are application domain independent 
while providing a large degree of functionality. Application concepts have to be added as an additional step 
in development. 

The parts of a program added onto the basic implementation of an architectural component are moderately 
concrete and very application domain dependent. Application domain concepts identified in a domain 
model are captured in a programming language. Programming languages are moderately concrete forms of 
representation (an executable version is easily achieved by compiling and linking). In a programming 
language, concepts are captured in the form of variable names (PO_Number) and operations 
(compute_total_bill). 

An executable program is highly concrete and very application domain dependent. That is, an executable 
program runs and thereby provides the functionality described in the a system specification. Executable 
programs are necessarily technology independent. That is, we can’t tell by it’s executable code if it was 
implemented using 0-0, relational, or AI technologies. 

There is a final technology component that has only recently become recognized. This component captures 
the ‘Tricks Of The Trade’. These are the programming heuristics, rules, and patterns that describe how to 
recapture one technology component into the representation of a second technology component. 
Programming practices are generally neutral in application domain specificity and neutral in abstraction. On 
one hand, they describe patterns of domain terms. On the other, the patterns are usually in terms of very 
abstract domain terms. They relate highly abstract representations with rather concrete implementations of 
the information. 

Development Geodesic 

If we examine a technology topology, one sees that there are islands of technology components that are 
reflected across the line of neutral abstraction. A system specification is reflected by an executable program. 
A domain model is reflected by an implementation of the application domain component. An architecture is 
reflected by a framework. The mapping of an abstract representation to a concrete representation is a 
development activity. 

A development geodesic can be viewed as the path of least resistance in the development of a product from 
a point on the abstract side of the graph to its counterpart on the other side of the topology. Three such 
geodesics exist on the topology that represent the reflections across the line of neutral abstraction. These are 
shown in Figure 2. 

The dashed path describes the development route for taking a domain model into a set of business code. 
The path traverses through the ‘tricks of the trade’ node - a practice which real developers perform to 
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Figure 2. Development Geodesics. 

achieve high quality and high performance code. In the object-oriented development world, these tricks of 
the trade are object design patterns which tell how to map an object model into an object implementation. 

The dotted path describes the development route for taking an architectural style into a framework. Again, 
this path traverses through the ‘tricks of the trade’ node. In this case, some of the tricks of the trade are the 
identification of COTS products that provide a basic framework for an application. These include products 
like X-Windows, DBMS, Web Servers, and others. 

The solid path describes the development route for taking a system specification into an operational 
application. While not really drawn, this path takes into account the other two development paths as well. 
The core path travels from system specification to a domain model and then onto an architecture. This is all 
work performed using abstraction representations. From those points, development is being performed to 
map them onto frameworks, sets of business code and then integrating them into an application. The 
development effort required to translate the abstract representations into concrete representations for 
domain models and architectures will follow the paths described previously. 

The lines between nodes represents a kind of effort to tie all these technologies together. The traversal from 
system specification to a domain model is traditionally a design process. A common manifestation of the 
design process is a requirements traceability matrix. A top-level design is mapped onto an architecture. 


Diverse Technologies 

There are many software technologies that have reached maturity. With maturity, we now try to exploit the 
strengths of each in obtaining critical business solutions. In the following sections, we examine several 
technologies and identify the topologies. In all cases, the axes remain the same ~ abstraction of the 
representation and dependency on the application domain. The difference among these technologies are the 
specifics of the locations on the topology of the components and the geodesics connecting the points. This 
will become obvious as we describe the topology of each technology. In particular, we will see that they 
differ even in terms of the words used to express basic programming concepts. 

Object Topology 

Object Oriented approaches to solving business problems have resulted in a number of very large, reliable, 
and functional systems. They are becoming the cornerstone of businesses as they demonstrate the ability to 
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Figure 3. Object Topology. 

rapidly adjust to changing business requirements. The topology for the object-oriented paradigm is shown 
in Figure 3. 

In an object topology, a domain model is expressed in terms of an object analysis model. In this technology, 
domain elements are captured in the form of objects, relationships among objects, and behaviors which 
objects can exhibit. Class systems for objects are communicated in the form of graphic illustrations with 
relationships expressed as links or attributes. Behaviors are captured in the form of event-trace diagrams. 

Certain architectural styles are common in object systems. In particular, one deals with architectural styles 
such as decision support, requester/provider, and event-driven styles. These are reflected in the types of 
frameworks and COTS available for object systems. In particular, one has MFC from Microsoft, Zapp from 
Rougewave, ObjectStore from Object Design, and Orbix from Iona to name a few. 

One area in which there has been a lot of major research of late concerns the ‘tricks of the trade’ technology 
component for the object paradigm. This has lead to a clear set of specifications concerning how to map the 
object model into source code. These specifications are object design patterns and the use of these patterns 
is becoming increasingly more wide spread. 

Relational Topology 

The technologies associated with relational data bases maps into a topology of its own. Relational systems 
have long held a major role in business applications. The topology for the relational paradigm is shown in 
Figure 4. This topology should be compared with the one for the object paradigm. 


SEW Proceedings 


139 


SEL-96-002 




Independent 


Dependency 
On Application 
Domain 


Dependent 



Architectural Styles - 
Client-Server 
Pipeline 


Conceptual Analysis 
Model 


System 

Specification 


Concrete Abstract 

Abstraction Of Representation 


Figure 4. Relational Topology. 

In a relational topology, a domain model is expressed in terms of an entity-relation model. In this 
technology, domain elements are captured in the form of tables, relations among tables, and operations over 
table entries. The entity-relation model is expressed in the form of graph illustrations that reflect a table 
view of the world. An entity-relation model is very different from an object model. 

Certain architectural styles are common among relational systems. The most widely known is the client- 
server architecture in which their is a common server and any number of clients that may be presentation 
systems and/or decision support systems. These architectural elements are reflected in the COTS products 
available for relational systems. DBMSs are one kind of product available on the server side and client-side 
products like PowerBuilder are becoming more widely used. 

The ‘tricks of the trade’ technology component are being captured in the form of Design Patterns which 
relate how different domain models can be implemented in tables and queries over those tables. 

Al Topology 

AI is often an overlooked technology in obtaining business solutions. However, rule based systems are still 
quite a factor in the software enterprise. The AI topology is shown in Figure 5. In a vein appropriate to the 
fiizzy heuristic driven AI world, developers often talk in terms of development heuristics instead of design 
patterns. 

In an AI topology, a domain model is expressed in terms of a knowledge level analysis model. The principle 
concepts involved are: facts, predicates, rules, and chains of inference. These elements are usually captured 
in natural language form. 

AI systems are usually implemented as either consultation systems or embedded systems. The COTS 
products are limited to inference engines and development tools that support either mode of operation. 
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Figure 5. AI Topology. 

Web Topology 

One of the hottest technologies in the market place today is web technology. This promises to solve many of 
the problems associated with large scale use of applications in non-homogeneous computing environments. 
The browser, available across many platforms, provides a front-end to an application on a back-end 
machine. The topology for web technology is illustrated in Figure 6. 

In a web topology, a domain model is often expressed in terms of pages, forms, and state models. In this 
technology, information is presented as a page of material, a form to be filled out, or as a single snap-shot in 
a sequences of pages. Expression of design is achieved using CGI bin scripts and HTML documents. 
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Figure 6. Web Topology. 
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In terms of architecture there are client-server systems and hypertext documents. COTS products include 
the servers, the clients, and some authoring tools. There are some COTS systems that provide a page 
generation between the web server and permanent data stores (such as DMBSs). 

The development geodesic is currently poorly understood since individuals are still exploring the topology 
and the topology is still undergoing tremendous changes. One of the most common appears to layout the 
basic page appearance and to implement whatever processing needs to be performed as a single cgi-bin 
program on a page-by-page basis. 

Building Hybrid Systems 

It is clear that as systems get larger and more complex that the strengths of any one technical approach will 
fail to meet business needs totally. To counter this, mixtures of technical approaches are being employed. 

If we were to place any two technology topologies one atop the other, we would see that each has the same 
components, but the components are placed in slightly different locations. As suggested in the previous 
section, this is because the different technologies provide different abstractions for expressing application 
domain concepts. The result of mixing two paradigms and trying to treat them as a single technology is 
illustrated in Figure 7. 

A key to understanding the problems associated with mixing technologies to recognize is that two points 
now reside in each area where a single point used to exist. There are now two different kinds of domain 
models, one which is appropriate for one technology and another for the other technology. There are two 
points for architectural styles, each point identifying a set of architectural styles appropriate for the 
individual technologies. There are two points in the frameworks region denoting that there are different 
architectural styles being implemented for the two different technologies (and the fact that there are 
different COTS products). Finally, there are two sets of business code reflecting the fact that two different 
domain models are being captured. 

The significance of this picture becomes most apparent as a result of tracing the development geodesics on 
the topology. The development paths become much more complicated since we have the existing paths for 
each technology and additional segments that have to be added to connect the dual points. It is necessary for 
the connections between dual points to be made so that one ends up with a fully functional application. In 
particular, if the two sets of business code do not integrate seamlessly, then the application won't function. 
In order for the two sets of business code to integrate seamlessly, then some sort of integration models must 
exists for the two domain models. 

If one takes the superficial view that each segment of a development geodesic represents some standard unit 
of work that must be performed during development, it is obvious that more work is required to implement 
an application. In fact, one could easily be convinced that mixing technologies can require almost three 
times as much work as implementing from scratch within a single technology. The necessary work could be 
computed as the sum of the work required for one technology plus the work for the second technology plus 
the work for integrating them. 
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Figure 7. Mixing Two Different Paradigms. 

Of course, considering each link to represent some standard unit of work is a superficial view. This view 
ignores some of the advantages which one has when COTS products are employed in a system. However, 
the introduction of a COTS product does not eliminate the work entirely. Hence each link does represent 
some amount of effort, but that amount will differ according to technology and COTS products supporting 
it. That is why using a COTS product in an application can be cost effective. 

There is another observation that can be made on the basis of Figure 7. This observation is that while the 
individual paths for each technology can be well known and understood, the little development links 
necessary to connect the development nodes can be virtually unknown. This is shown by the fact that there 
aren’t any ‘tricks of the trade’ for linking two technologies. 

In essence, one aspect of using two technologies is identifying how the domain models can be linked 
together, what architectural styles work well together, how to connect frameworks, and how the business 
code can be integrated across technological abstractions. This kind of unique, first of a kind activity is one 
that can require time and money. Further more, it carries with it a high degree of risk. 

The trade-off concerning the relative costs of staying within a single technology or mixing technologies has 
to be made on a total cost perspective. Staying within a single technology might lead to high costs because 
of the effort associated with developing of a major functionality which is not provided in any other way. On 
the other hand, the cost of mixing two technologies may be high because of the effort of development for 
each technology and the difficulties in connecting them together. 

Summary 

This paper has presented a tool, the technology topology, for understanding COTS integration challenges. It 
used this tool to describe the relative relationships among the components of a technology. It described how 
development of an application traverses a geodesic across the topology. 

A major section of the paper dealt with the different kinds of software technologies. It identified the basic 
concepts and laid them out on the topology. This was done as a preparation for demonstrating how two 
different technologies fail to overlap on the topology. The naturally arising differences in location on the 
topology was identified as the major source of integration challenges. It showed how development of a 
system using two separate technologies could easily require much more work than development from 
scratch using a single technology. 
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We did not cover some issues associated with COTS integration. These issues include a lack of maturity of 
a technology. An example of this is the Web Technology which is still in its’ infancy and one in which not 
all of the major technological components have been adequately developed. Another issue deals a lack of 
maturity in a product. Not all products are equally mature within a technology. This lack of product 
maturity can demonstrate itself as a lack of basic features or inconsistant application of a technology. Also, 
an immature product can be very buggy. Each of these issues raise additional integration challenges as 
developers try to work around bugs in one product by implementing a feature in another. 
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Today’s Talk 


• The COTS Integration Problem 

• Technology Topologies 

- A Tool For Understanding 

- Directing & Understanding Development 

- Different Technologies - Different Topologies 

• Insights Into the COTS Integration Problem 
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COTS Integration Problem 



• The COTS Integration 
Problem Arises Whenever 
Someone Observes That 
Their Business Need Can Be 
Satisfied By Two COTS 
Products 

• All They Have To Do Is Just 
Integrate Them Together 




• A Few Quick Questions: 

- How tough can it be? 

- How much can it cost? 

- How long can it take? 

• A Few Hard Won Answers: 

-It can be very tough 
-It can cost a LOT 
-It can take a long time 
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Reasons Sound Like Excuses 


• One Product Assumed Certain Things 
Inconsistent With The Other Product. 

• We Had To Write A Lot Of Code To Get 
Them To Talk To Each Other. 

• We Had To Modify Them To Talk With 
Our Other Systems. 

• The Presentation Of Information Didn’t 
Really Follow Our Corporate Standards. 



• Real Effort Was Expended To Get It To 
Work 

• There Are Real Reasons Why It Is Tough 
To Integrate COTS software! 
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A Tool For Understanding Why 


• Technology Topology 

- A Roadmap Relating Different Technology 
Components 

- Development Methods Are Geodesics For 
Traversing The Topology 

- Clarifies Integration Problems 
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General Comments 


• System Requirements Are Independent of 
Technology 

• An Application Is Just An Application 

• Different Architectural Styles Support 
Different Technologies 

• Some Points Of Topology Are In Very 
Different Locations 
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Is It A New Paradigm? 


Union Of The Two Technologies Pins a Little Bit From Their Integration 
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Entirely New Development Path 


Factors Not Addressed 


• Maturity Of Technology 

- No Development Path Defined 

- Missing Technology Components 

• COTS Product Maturity 

-Bugs 

- Inconsistencies 

- Technological Disconnects 
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Summary 


• COTS Integration Issues Arise Naturally 

• They Are Complex In The Sense That They 
Are Present At Each Step In Development 

• Vendors Can Help By Providing Hybrid 
Development Paths 
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Introduction 


The problem faced by many of today's software engineers is to build and maintain broad fam- 
ilies of large systems in a cost-effective and timely manner. Because the market demands rapid 
creation and modification of systems in response to a spectrum of evolving requirements, exten- 
sive flexibility in systems is required. This situation has two implications: first, basic system 
demands have to be met quickly; and, second, responses to requested variations have to be rapid 
and effective. System development and modification cycle time must be shortened significantly. 

One approach to cycle-time improvement that has been studied extensively is software reuse. 
Current reuse techniques include system synthesis using application-generator technologies and 
component-based development techniques. The latter has been effected in several ways, including 
subroutine libraries, templates, and a variety of class and framework mechanisms. 

On the basis of some experimental systems work, we suggest that a relatively new approach 
might merit increased attention from the research community. The approach is based on the inte- 
gration of large, application-scale, binary components. To date the approach has been employed 
industrially using shrink-wrapped packages, such as Microsoft Office and Visio Corporation’s 
Visio technical drawing tool, mostly for business and office automation tasks. 

We have shown that this approach can be applied more aggressively, using today’s technol- 
ogy combined with advanced integration strategies such as mediators [S94], to develop systems 
in at least one domain far removed from business data processing, quickly and at low cost. Our 
demonstration application is a fault-tree analysis tool embodying new analysis techniques devel- 
oped by Joanne Bechta Dugan at the University of Virginia. 

We cannot infer broad generality from a single example. However, it does appear that our 
approach can be applied in developing a range of modeling and analysis tools using existing tech- 
nology. The approach does appear to overcome some previously encountered impediments to 
large-scale reuse [G95], That, however, is not the main point of this abstract. More importantly, 
our success applying the approach in an engineering domain suggests the hypothesis that we can 
thoroughly characterize, develop, and generalize it, so as to enable its application to solve prob- 
lems in a significantly wider variety of problem domains. 

A key problem is that we do not yet understand very well what features of the approach 
account for its success even in the limited domains of business data processing and tools. We have 
decided to focus part of our research on answering this question. A first objective is to deter mi ne 
what general features of the approach account for successes to date. A second objective is to 
determine what is required to develop and generalize the approach, so as to apply it more aggres- 
sively and systematically to problems in domains in which the existing technology is inadequate. 

In this abstract, we present early answers emerging from our attempts to determine which gen- 
eral design properties account for the success of this large-scale integration approach. We begin 
with an analysis that suggests why it seems to offer perhaps greater promise than previous, 
smaller-scale reuse approaches. 


Component Size 

It can be argued that reuse in which engineers attempt to develop systems by reusing small 
building blocks does not attack the essence of the problem. Consider, for example, a system that 
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ends up being one million lines long. Even if the entire system is build with C++ reusable classes 
and the classes are say 100 lines long on average, the total number of items being composed in 
10,000. As well as understanding and using the classes themselves, the system developers have to 
design and implement the interconnections among the components, and maintain intellectual con- 
trol over these interconnections. That is a massive design task that would appear to be inconsistent 
with fast cycle time and low cost. Developing a system with such building blocks remains a tre- 
mendous challenge, and the total software engineering burden has been reduced some but not 
enough by reuse of small building blocks. 

Achieving truly significant benefits from component-based reuse would appear to require the 
reuse of massive components so as to enable large systems (for example, one million lines) to be 
constructed by straightforward integration of just a few components. With this goal in mind, it is 
clear that components that average 100 lines in length are too small by about three orders of mag- 
nitude. Despite the obvious benefits, attempting to reuse large components has met with only lim- 
ited success. One problem, as reported by Garlan et al. [G95], significant difficulties can arise 
with what has been referred to a s architectural mismatch ; but this is by no means the only prob- 
lem. 

An even more aggressive view is that successful development based on the reuse of massive 
components is unlikely to be realized by incremental improvements in the size of typical reusable 
components from their present small size. Is it necessary for larger component sizes to come 
about only incrementally as more is learned about building flexible components? The experiment 
that we are conducting has shown the feasibility of using massive components today, and suggests 
that an immediate transition to the use of massive components is possible, at least in certain cases. 
As an alternative to trying to make progress by “climbing up” from the use of small components, 
we suggest starting with a massive component approach and “working backwards” as difficulties 
are encountered. 

Our use of massive components is different from the way in which components are used in a 
traditional systematic reuse approach. The components that we are using each provide tremen- 
dous functionality, and each is many hundreds of thousands (possibly millions) of source lines in 
length. Despite this, we have found the integration of these massive components to be successful 
in the senses that they were easy to use and the resulting product performs as required. In view of 
their size and functionality, we think that it is important to distinguish between the more tradi- 
tional notion of component and the type of massive component that we are using. We have coined 
the term “application service” to describe the latter and will refer to such components using this 
term throughout the remainder of this presentation. 

Using Application Services 

In an earlier paper [SK95], we reported preliminary results of an experiment on large-scale 
systematic reuse. We described our experience with efforts to exploit an architecture (Microsoft’s 
OLE) that permits very large components to be integrated. We used this architecture enhanced 
with mediators [S94] and several application services to develop a high-quality, industrial- 
strength software toolset. Our conclusions were that the basic architectural concept worked well, 
although several technical difficulties remained. 

The high-level architecture of the toolset that is the subject of the experiment that we are con- 
ducting is shown in Fig. 1. The toolset provides facilities for a technique called system fault-tree 
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Application Virtual Machine 

Figure 1 - Toolset architecture. 

analysis — a technique used in reliability engineering [V81]. A main program is responsible for 
providing the user’s primary control mechanism, and also initiates execution of the required 
application services. Three application services are used: Visio Corporation’s Visio technical 
drawing program, Microsoft’s Access database program, and Microsoft’s Word text processing 
program. Visio is used to provide a graphic representation using customized icons of the fault tree 
of interest together with a graphic (click and drag) editing facility. Access provides a general data- 
base facility that is used for storing fault trees and various forms of failure data used in the analy- 
sis. Word is used to edit an ASCII representation of fault trees that is useful for certain kinds of 
fault-tree creation and editing. These three application services are supplemented with mediators 
that provide links between the application service and certain canonical data structures main- 
tained by the main program. Critical reliability analysis functions are available to the main pro- 
gram in a conventional form as a set of classes (shown in Fig.l as the computation kernel). 

The three application services form what we refer to as an application virtual machine. It is 
this virtual machine that the main program manipulates, along with the computational kernel, to 
provide the toolset’s functionality. This manipulation uses subprogram calls as might be used in a 
traditional design together with action invocation via events. 

The toolset that we built demonstrates industrial strength functionality and performance. 
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Some cosmetic elements of the individual application services remain. The services do provide 
some support for customizing or removing application-specific interface elements. We intend to 
remove those that are not needed, to the extent that this improves the toolset’s coherence and 
appearance, and to the extent supported by the existing application packages. Certain key func- 
tional elements (such as editing commands) will be left available through the application services’ 
own user interfaces so that they maintain the look and feel of similar products. How best to sup- 
port integration at the application service user interface level remains a technical — and perhaps a 
research — issue that has not yet been fully resolved. 


Reuse Analysis 

In terms of reuse, the results we achieved were successful — we believe significantly more 
successful than might be expected since the toolset was built using application services. Why is 
this the case? In this section we present the results of a preliminary analysis of this success. 

The application services provide enormous functionality including large and important parts 
of the basic functionality of the domain. Not only is this functionality provided but it is provided 
with sufficient flexibility that it can be tailored easily to the specific needs of an individual prod- 
uct. We refer to this functionality as the critical superstructure of products in the domain. It is this 
aspect of many products that consumes the vast majority of the resources yet is not what provides 
the unique capabilities of the product. In the case of the toolset we have used in our evaluation 
experiment, for example, many parts of the toolset are commonly found in software tools. 

That the flexibility offered by the application services was not overwhelming is counterintui- 
tive. Many efforts to generalize components to meet a variety of needs has resulted in components 
that are unwieldy. Support for advanced specialization mechanisms seems to play a key role in 
this regard. Although it is possible to specialize Visio by writing custom code in C++, it is more 
common to exploit its spreadsheet mechanisms to “program” the behaviors of user-defined 
shapes. Similarly, one specializes Word by defining new document templates. These mechanisms 
provide high-level support for flexibility in the dimensions that are actually critical in practice. 

Even with the provision of powerful flexibility features, application services cannot be used 
effectively unless they can be integrated smoothly. The architectural approach used (OLE, media- 
tors, and die application virtual machine structure) allowed a set of application services and prod- 
uct-specific software elements to be integrated so that the resulting system presents a 
comprehensive unified interface and behavior to the user. This is a significant result since integra- 
tion involves a variety of invocation and data interchange requirements. 

As well as the combination of functionality, flexibility, and integration facilities, a number of 
other complex aspects of both the application services and the integration mechanism contributed 
to the successful reuse that we observed. We summarize briefly the main reasons for the reuse 
success here — more details together with examples will be given in the presentation: 

• Architectural coherence. 

All of the application services used were designed to work in the OLE environment. This per- 
mitted their use in a systematic way and avoided several instances of architectural mismatch. 

• Supplementary use of mediator architecture. 

We avoided many difficulties by supplementing the OLE architecture with use of mediators in 


SEW Proceedings 


159 


SEL-96-002 



the toolset design. 


• Provision of the critical superstructure. 

The application services enabled the creation of the critical superstructure relatively easily. 

• Provision of essential flexibility. 

The application services provided flexibility in ways well suited to their use in a reuse context. 

• Advanced support for exploiting flexibility. 

The application services include powerful mechanisms to permit exploitation of their inherent 
flexibility. 

• Managed object model. 

The application services provide a managed object model in that they implement an internal 
object structure that is powerful yet accessible from their application-programming interfaces 
thus permitting fine-grained integration. 

• Provision for add-on functionality. 

The functionality of an application service is easily supplemented by creating the requisite 
additional functionality as a software entity that can be invoked by the user in a number of dif- 
ferent (and powerful) ways via the application service. 


Conclusion 

Our conclusion is that by careful design of both application services and the architecture with 
which they are integrated, large systems can be built successfully using components up to three 
orders of magnitude larger than components found in typical reuse libraries. We know of no com- 
parable results demonstrating the degree of integration we have achieved at this scale. Our result 
has not been proven generally, but it has been demonstrated. The demonstration is sufficiently 
successful, and the reasons why understood well enough, that increased attention to the technical 
and research issues that have to be resolved to generalize the approach appears to be warranted. 
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Goal: Radical Improvement 

• Productivity 

• Quality 

• Cycle Time 
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Means 


• Novel Architectural Styles 
•For Leading Edge, Real Systems 

• Respectful of Key Design Realities 

• Explored & Demonstrated by Case Studies 


Problem Domain: Tools 

• Even Simple Techniques Demand ... (10 4 ) 

• Massive Superstructures (10 6 ) 

- graphical user interface 

- technical drawing 

- text formatting 

- data management 

• From-Scratch Construction Uneconomical 
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Case Study 

• Given New Modeling & Analysis Techniques 

• Develop Industrial Strength Software Tools 

• At Radically Reduced Cost & Cycle Time 

• Dugan ’s Hybrid Fault-Tree Analysis Method 


Traditional Reuse Inadequate 

• E.g., Object-Oriented/Libraries 

• 1 Million Lines of Code (10 6 ) 

• 100 Line Reusable Components (10 2 ) 

• Need 10,000 Components (10 4 ) 

• Still A Horrendous Design Problem 

• Doesn’t Attack Essence of Problem 

• Result— Many Terribly Inadequate Tools 
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Attacking the Essence 

• Simple and Straightforward Integration 

• Of a Few Parts (10 1 ) 

• Tailored Quickly and at Low Cost 


Observation 

• Powerful New Applications 

- Microsoft 

- Visio Corp. 

- Others 

• Specializable 

• Integratable 

• Key Subdomains 
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Concept 


• Application Packages as Components 

• Application Virtual Machines 

• Package-Oriented Programming (POP!) 


An Old Idea 


“Perhaps the simplest instance of reusability (and the one 
with the highest leverage) is the purchase of an existing 
software package. The purchasing organization pays very 
little compared to building an equivalent capability in-house 
and it is up and running in a short time. Even if a limited 
amount of customization is necessary, this is often small 
compared to the cost of building and entirely new system. 

If organizations will come to the point of accepting such 
prepackaged systems, then a major step forward will have 
been achieved [Horowitz & Munson, IEEE TSE, 1984] .” 
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Still A Good Idea 


“An especially promising trend is the use of 
mass-market packages as the platforms on 
which richer and more customized products 
are built [Brooks, MMM, 1995]” 


Questions of Feasibility 

• “The [programmer] who uses ... applications 
as components ... is the user whose needs are 
poorly met today [Brooks, MMM, 1995].” 

• Architectural Mismatch [Garlan 95] 

• Lack of Demonstrated Success 
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Hypothesis 


Workable Basis for Mega-Reuse 


Evidence 


Appears to Work for Tools 
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What Made It Work? 

• Right Basis Components 

• “MightyMorphic” Components 

• High Valence Components 
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Right Basis For Tools 


• Technical Drawing 

• Text Management 

• Data Management 

• Domain-Specific Language 

• Domain-Specific Types 

• Domain- Specific Analysis 


“MightyMorphism” 

• Flexibility in Critical Dimensions 

• High-level Specialization Mechanisms 

• Provisions for Add-on Functionality 

• Control Over User Interfaces 
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High Valence 

• Architectural coherence (OLE) 

• Application Programming Interfaces 

• Managed object model (Visio) 


Conclusion 

• We Demonstrated Effective Mega-Reuse 

• “Order of Magnitude Better Tool” --Dugan 

• Promising Architectural Concept 

• Investigation of Generalizability Warranted 
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TECHNOLOGY EVOLUTION: 
COTS Transition at Raytheon 1983-1996 

(Part I) 


/ 


Tom Lydon, Laurie Fischer, Karl Gardner 
Raytheon Company 


ABSTRACT 

The Raytheon RES Software Engineering Laboratory is a large software development 
organization consisting of about 1200 engineers. It has been independently rated as an SEI 
Level 3 site for four years and won the IEEE Software Process Achievement Award in 
1995. The Raytheon process relies extensively on the use of integrated engineering tools to 
achieve process control. Data on 84 tools, 25 Raytheon-developed and 59 COTS, over the 
period 1983-1996 shows that the number of tools used in software engineering has grown 
from an average of about 4 tools per engineer in 1986 (not including standard host editors 
or compilers provided with the OS) to about 12 tools per engineer in 1996. Furthermore, 
over this period there was a definite, systematic swing from Raytheon-developed tools 
which were predominant from 1984-1990, to COTS tools which have been predominant 
since 1990. The current mix is about 3 Raytheon tools and 9 COTS tools in use per 
software engineer. 




There are pros and cons to the use of COTS tools. Overall costs have initially gone up, but 
they are still expected to go down in the future, though this is not certain. Standardization 
is one method of controlling overall costs. Productivity and quality both appear to improve 
with the use of COTS tools, but this improvement data is also a result of other factors such 
as process initiatives, training, and better hardware (workstations). There are regular births 
and deaths of tools, and this "churning" must be managed. The data suggest that the 
overall use of tools will level off over the next few years at about 13 tools per engineer, 2 
Raytheon and 1 1 COTS. COTS is not a panacea, but they are here to stay. 

This paper is the initial portion of a study of overall COTS tool costs. Data on the use of 
Raytheon-developed and COTS tools is included, but life cycle cost data has not yet been 
collected. The second portion of this study will be completed early in 1997. 


BACKGROUND 

The Raytheon Electronic Systems (RES) Software Engineering Laboratory (SEL) is a 
large, diverse software development organization, geographically distributed across eight 
major sites in six different states (primarily Massachusetts). This laboratory develops 
software for the primary RES business areas of Command & Control Systems; Naval, Air 
to Air, and Strike Systems; Air Defense Battle Management and Radar Systems; 
THAAD/Ground Based Radar; and Transportation Systems, including Air Traffic Control. 

The Raytheon RES Software Engineering Laboratory has been independently rated as an 
SEI Level 3 site for four years and won die IEEE Software Process Achievement Award in 
1995. The Raytheon process relies extensively on the use of automated, integrated 
engineering tools to achieve process control. The number of software engineers in the 
current RES SEL has grown from about 600 in 1983 to about 1200 in 1996. 
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DATA ON TOOLS 


During the 1980's, Raytheon had three major software laboratories located at Missile 
Systems Division in Bedford and Burlington MA, at Equipment Division in Sudbury, 

Way land, and Marlboro MA, and at Submarine Signal Division in Portsmouth RI. These 
divisions employed about 600 software engineers in early 1983. The three divisions were 
consolidated into a single division, Raytheon Electronic Systems (RES) in early 1995, and 
three software laboratories are now consolidated into a single Software Engineering 
Laboratory (SEL). 


In the 1980's, there were three types of tools used: 

(a) Raytheon-developed internal tools 

(b) Tools provided "free" with operating systems, such as yacc, lex, lint, 
curses, and troff with Unix, and CMS with VMS 

(c) Purchased third-party tools, now known as COTS 

Raytheon's policy was always to encourage purchasing category (c) wherever possible, in 
preference to building our own in category (a), but the fact was that most of our 
requirements could not be met by COTS tools, so the majority of tools were developed 
internally. The data included in this study is from categories (a) and (c), and does NOT 
include data on the use of tools that were provided standard as "no cost" components of the 
operating system. 

Also in the 1980's, there were four sources of funding for tool acquisition or development: 


SOURCE 

Corporate 

Cost Center 

Program 

Overhead 


Approx % of 

Funding - COMMENT 

40% - interdivisional initiative to aide many projects 

- used mainly for tool development 

30% - common tools used by many programs 

- costs centrally absorbed, redistributed to programs 

20% - used for tools specific to program needs 

- tools owned by the program, not Raytheon 

10% - used for productivity improvement tools 

- used for special tools on high-end computers 


As lead engineer for both Corporate and Cost Center tool programs at the time, and through 
regular interaction with programs and overhead tasks, I was able to reconstruct good 
historical data on the extent of use of 84 tools over the period 1983-1996. The goal was to 
analyze the true costs of conversion from internal to COTS tools. Data on tool use has 
been developed, but data on costs has not yet been finalized. 

This study of 84 tools includes 25 Raytheon-developed tools and 59 COTS, mostly for 
computer-aided software engineering (CASE). The data is shown below in a table where 
each row represents one tool, and the columns represent average "Fraction-of-Use" data for 
each year. "Fraction-of-Use" is the decimal fraction of software laboratory engineers who 
used the tool on a regular basis during the year. For example, ".3" means that 30% of the 
engineers used that particular tool (either internal or COTS) on a regular basis that year, so 
if there were 800 total engineers then 240 used the tool. 
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TYPE 

CATEGORY 

82 

?? 

— r 84 nr[ 

85 

88 

-87 

gg 

S3 

90 

91 

cp 

88 

91 

85 
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COTS 
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0.1 


j ... 
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RAY 
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0.1 

02 

03 
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RAY 
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0.1 

02 

0.3 

03 

0.3 

03 

03 

0.3 

03 

03 

03 

03 
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0.1 

0.1 

Common 

RAY 

DOCUM 

02 

0.3 

0.5 

03 

0.1 











Common 

COTS 

COST 


0.1 

0.1 

0.1 

0.1 

0.1 

0.1 

0.1 

0.1 

0.05 

0.01 

0.01 

0.01 

0.01 

0.01 

Niche 

COTS 
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0.1 

0.1 

0.1 
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COTS 

CODING 


0.1 

0.1 

0.1 
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DESIGN 


0.1 

03 

0.4 

0.4 

0.4 

0.4 

0.4 

0.3 

03 

0.1 

0.1 

0.05 

0.05 

0.01 

Common 

RAY 

DESIGN 


0.1 

03 

0.3 

0.4 

0.4 

0.4 

03 

0.3 

03 

0.1 

0,1 

0.05 

0.05 

0.01 

Common 

RAY 

CM 


0.1 

03 

03 

0.4 

03 

0.6 

0.6 

0.6 

03 

03 

0.4 

0.4 

03 

0.1 

Common 

RAY 

MAINT 


02 

03 

03 

03 

03 

0.1 

0.1 








Common 

RAY 

MAINT 



0.1 

0.1 

0.1 











Niche 

COTS 

DB 



0.1 

03 
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RAY 
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03 
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COTS 
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03 
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0.1 
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RAY 
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03 

03 

03 
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03 

03 
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0.01 
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RAY 

CM 
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03 
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03 
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03 

0.4 
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0.3 

Common 

COTS 

DB 




0.1 

03 

0.3 
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0.6 
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03 
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0.3 
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COTS 

DB 




0.1 
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RAY 
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COTS 

COTS 

COTS 


TRACE 

DATA 

GUI 

MGMT 

CODING 

DATA 

DATA 

DEFECT 

GIS 

GUI 

DATA 

DESIGN 

MGMT 

CM 

CM 

DOCUM 

DATA 

DB 

COST 

COST 

GIS 

GUI 

DATA 

CODING 

DOCUM 

DEFECT 

REQTS 

REQTS 

DB 

DOCUM 

DOCUM 

GUI 

CODING 

CODING 

CODING 

CODING 

DEFECT 

CM 

DOCUM 

TRACE 

DEFECT 

DESIGN 

TEST 

DOCUM 

DATA 


0.1 

0.1 

0.1 

0.1 

0.1 


0.1 

03 

0.1 

02 

0.1 

0.1 

0.1 

0.1 


02 

03 

0.1 

0.3 

0.1 

0.1 

0.1 

02 

0.05 

0.1 

0.1 

0.1 

0.1 

0.1 

0.1 

0.1 


02 

02 

0.1 

0.3 

0.1 

0.1 

0.1 

0.3 

0.1 

0.1 

0.1 

0.1 

0.1 

02 

0.1 

0.1 


0.3 0.3 

0.2 02 
0.1 0.1 
0.4 OS 
02 02 


0.1 

0.1 


0.1 

0.1 


03 03 

0.05 0.01 
0.1 0.1 
0.1 0.1 
02 02 


02 

02 

02 

0.6 

03 

0.1 

0.1 

02 

0.01 

0.1 

0.1 

03 


03 03 0.8 


0.05 0.05 
0.05 0.05 
0.05 
0.05 
0.01 
0.1 
0.1 
0.1 
0.1 
0.1 


0.1 0.05 

0.05 0.01 
0.05 

0.05 0.05 

0.01 


0.1 

0.1 


0.1 

0.1 


02 03 

02 0.1 
02 02 
0.01 0.01 
0.1 0.1 
0.05 0.1 

0.05 0.1 

0.1 
0.05 
0.1 
0.1 
0.1 
0.1 
0.1 
0.1 


0.01 

0.01 

0.05 

0.1 

0.1 

0.4 

0.1 

0.1 

0.1 

0.1 

0.1 

0.05 

0.1 

0.1 

0.1 

02 

0.3 

0.1 

0.01 

0.01 

0.01 

0.1 

0.1 

0.1 

0.1 

-01 


TOTAL 

84 

0.5 

1.6 

27 

43 

4.9 

6 

6.9 

73 

8.6 

9.45 

103 

10.9 

113 

11.7 

123 

RAY 

25 

0.4 

13 

31 

3.1 

&6 

43 

4.6 

4.7 

4.6 

4.4 

4.35 

4.15 

4 

a6 

3.03 

COTS 

59 

0.1 

0.4 

03 

1.1 

13 

1.7 

23 

23 

4 

5.1 

59 

6.7 

73 

8.1 

93 


Common 

Common 

Niche 

Common 

Niche 

Niche 

Niche 

Common 

Niche 

Niche 

Niche 

Common 

Niche 

Common 

Niche 

Niche 

Niche 

Niche 

Niche 

Niche 

Niche 

Niche 

Niche 

Common 

Niche 

Niche 

Niche 

Niche 

Niche 

Niche 

7 

? 

9 

9 

? 

? 

Common 

9 

? 

? 

? 

9 

? 

9 

? 
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The data is usually rounded to the nearest 10% due to inability to measure the data more 
precisely. For example, some projects used tools for parts of the year, and other projects 
did not have 100% of their engineers use a given tool at a given time, so annual weighted 
average Fractions-of-Use have been totaled across all programs and rounded to the nearest 
tenth. In some cases, small numbers such as .05 (5%) or .01 (1%) are used to indicate that 
a tool was still in active use, but only by small populations of users. 

• Overall Use of Software Tools has Steadily Increased 



• Use of Raytheon-Developed Tools has Dropped Off 
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The data shows a definite, systematic swing from Raytheon-developed tools to COTS 
tools. As the use of COTS CASE tools has increased, the use of comparable Raytheon- 
developed tools has declined, while the overall use of tools for software engineering has 
steadily increased. 



• Counterpoint: Swing from Internal to COTS Tools 
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The shift from internally developed to COTS software tools over this period has been 
driven by several factors, including: 

1. Customer Requirements - Customers are becoming increasingly knowledgeable and 
sophisticated in their requirements for software development. In some cases, they require 
certain specific tools to be used on a program. In other cases, they require a COTS tool be 
used, if not a specific tool. 

2. Need to Improve Productivity - Contractors are in constant competition, and if there is a 
better/faster way to develop software, they must learn and take advantage of it. Many 
COTS tools embody well-documented computer-aided software engineering (CASE) 
methodologies specifically aimed at improving productivity. 

3. Need to Improve Quality - Similar to productivity, competition for improved software 
quality (fewer defects) is acute. COTS tools with CASE methodologies also specifically 
aim at improving quality. 

4. Need to Improve Turnaround Time - Sometimes called time-to-market, this is often more 
important the either productivity or quality. A task may cost the same, or it may cost more, 
but if you can complete it with die same quality in half the time there is often a premium that 
can be gained. 

5. Need to Reduce Costs - In-house tools require internal staffing for maintenance, 
upgrades, and support. COTS tools appear now to be mature enough that vendors can get 
a wide enough usage base to defer these costs more cost-effectively than any one user 
could do themselves. Thus there is an opportunity to reduce internal staff (or redirect to 
other projects) and reduce overall computing costs to programs. 

6. Standardization/Integration - By using COTS tools, it is easier to standardize tools 
across organizations (e.g. divisions), and to provide more standard tool integration 
mechanisms, allowing for synergy across tools and programs. 

The primary development environments) for Raytheon software development have 
evolved gradually over the past 15+ years. In the 1980s there were almost equal amounts 
of VMS and Unix based development From 1990-1995 it was primarily Unix based 
(several flavors) development. Since 1995 it has still been primarily Unix development, 
but we now see many smaller, more commercial programs beginning to use NT as the 
platform of choice. 
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TYPES OF TOOLS 


The data on the 84 tools can be divided into the following categories of tools, by primary 
application to the software life cycle: 


CM 

CODING 

COST 

DATA 

DB 

DEFECT 

DESIGN 

DOCUM 

GIS 

GUI 

MAINT 

MGMT 

REQTS 

TEST 

TRACE 


Configuration Management, code control, automated build tools 

(internal tools have been built on top of sees, RCS, and CMS) 

Support for coding, compilers, debuggers, environments 

Cost estimation tools 

Data analysis and data reduction tools 

Database related tools, e.g., relational or object-oriented tools 

Defect collection, tracking, reporting tools 

Preliminary and detailed design tools, PDL and graphical 

Documentation tools 

Geographic Information Systems tools 

Graphical User Interface tools 

Maintenance and code documentation tools 

Management tools 

Requirements analysis tools (some overlap with prelim, design) 
Test support, test generation, test tracking tools 
Traceability tools 


The mix of tools by category and type (Raytheon or COTS) is shown below: 


Category 

Raytheon 

CM 

3 

CODING 

0 

COST 

0 

DATA 

2 

DB 

0 

DEFECT 

6 

DESIGN 

3 

DOCUM 

2 

GIS 

0 

GUI 

0 

MAINT 

3 

MGMT 

2 

REQTS 

1 

TEST 

1 

TRACE 

2 

25 


COTS 

3 
9 

4 

9 

5 
1 
4 

10 
2 
4 
0 
2 
4 
1 

1 

59 


TOTAL 

6 

9 

4 
11 

5 
7 
7 

12 

2 

4 

3 

4 

5 
2 
3 

84 


Raytheon is a large software engineering organization that is NOT primarily a developer of 
software tools. From the distribution of data, it is clear the Raytheon has spent relatively 
more effort on defect-tracking and maintenance tools, and relatively little effort on coding, 
cost estimation, data analysis, database, documentation, GIS, and GUI tools. 


INDIVIDUAL TOOL CATEGORIES 

A closer look at a few tool categories reveals some micropattems in the data, for example: 
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a) The use of CM tools increased linearly until about 1987 when essentially all engineers 
used a CM tool on their project. Since then it has leveled off at about one tool per engineer, 
as would be expected. (The amount over one may be due to data rounding up.) 



b) The use of CODING tools increased very slowly through the 1980s, probably due to an 
acceptable level of "standard" operating system-supplied tools. The number of CODING 
tools has increased dramatically since 1993, however, indicating an active need for 
integrated coding environment tools, especially for embedded systems development. 



c) The use of DESIGN tools increased quickly to about one per engineer in the mid-1980s, 
probably due to immediate productivity and quality gains from the use of these tools. As 
expected, it has leveled off at one tool per engineer, since it is rare that a project would need 
to actively use two different, independent DESIGN tools per engineer. 
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d) The use of MANAGEMENT tools has remained at about one-half tool per engineer since 
the mid-1980s. This makes sense, since not all engineers need to prepare schedules, track 
actuals, and report on project status. Group leaders, software lead engineers, section 
managers need to use these tools and make up about 1/3 the engineering base. 



e) The use of DEFECT tools has risen steadily since the mid- 1098s. This is partly because 
there has been a variety of tools, with projects having their own preferences, sometimes 
their own home-grown tools, and no consistent standard across all projects. 


2YT 
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82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 


f) The use of DATA tools blossomed initially in the mid-1980s, and again in the early 
1990s. The fist wave was probably due to the need for basic data analysis capability, but 
the second wave is more likely due to increased an emphasis on quantitative data collection 
and management while going from SEI CMM Level 3 to Level 4, which is what is 
happening at Raytheon now. 
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COTS TOOL ARE NOT A PANACEA 

Despite the well-documented shift to using COTS tools in place of internal tools, there is a 
new set of problems that must be dealt with due to this transition. 

The main tradeoff for a large software contractor is gaining quality tools at a reduction in 
cost (at least that's the promise) versus giving up control over those tools. The table below 
illustrates the principal PROS and CONS of this tradeoff: 


Internal 

Tools 


COTS 

Tools 


PROS CONS 


• Cost is fixed, not 

proportional to #users 

• Problem fixes usually faster 

• New capabilities can be 

added faster 

• More expensive to develop 

• More manpower required 

for maint & support 

• Usually lower quality tools 

• Slower initial availability 

• Less expensive to develop 

• Less manpower required 

for maint & support 

• Usually better quality tools 

• Faster initial availability 

• Cost increases almost linearly 

with #users 

• Problem fixes usually slower 

• New capabilities usually not 

added quickly 


Instead of immediately lowering costs, during transition from internal to COTS tools the 
costs appear to initially go up, due to the startup costs of tool acquisition and the fact that 
legacy tools must still be supported for some period of time on ongoing projects that cannot 
afford to switch mid-stream. A schematic diagram of this short term cost "bubble" effect is 
shown below: 
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There are other indicators of a more general, more persistent shift of support costs from 
hardware and labor to software, however. In a review of the costs incurred by the cost 
center to support about 500 software engineers in the former Missile Systems Division 
between 1990-1995, the percentage of costs related to software (purchase, maintenance, 
amortization) doubled from about 15% to about 30%, while labor fell from 50% to 30%, 
and hardware (purchase, maintenance, depreciation) remained constant at about 25-30% of 
total support costs. Other factors (supplies, allocations, etc) accounted for the remaining 
10 %. 


Software maintenance was the highest growth factor, since it is a function not of 
purchases, but of total active inventory. These trends are shown in the figure below: 
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One way to help control the number of COTS licenses required, and thus the cost, is to 
consolidate concurrent licenses as far as possible onto large license servers. This is not 
always possible, for example on classified projects, but there is an economy of scale on 
licenses required related to an increasing total number of users. 

For example for tools such as Rational Apex or Atria Clearcase, 10 networks of 10 users 
each would require 8-9 licenses on each network for a total of about 85 concurrent licenses, 
whereas 100 users on a single network would require only about 50 licenses, saving 35 
licenses or 40% up front. 

We have observed three classes of tools with different economies of scale based on usage 
patterns, referred to here as High, Medium, and Low Saturation tools. "Saturation" is the 
number of licenses required as a percent of the total number of users, and reflects how 
intensively a tool is used by an engineer. Coding environments tend to be high saturation 
tools, since coders usually stay in them most of die day, while occasional use tools tend to 
be low saturation tools. 
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•For large networks, fewer licenses are required per user 



• HIGH Saturation tools settle at about 50% licenses per users (e.g. Apex, Clearcase) 

• MEDIUM Saturation tools settle at about 25% licenses per users (e.g. Interleaf, StP) 

• LOW Saturation tools settle at about 10% licenses per users (e.g. MatLab, SPR) 

(Note: SPR is a Raytheon-developed internal problem reporting tool) 


STANDARDIZATION 

Another way to minimize the cost of tools, especially COTS, is to maintain a standard list 
of supported software. Raytheon currently maintains a list of 22 standard tools that are 
supported by the main software engineering cost center. This provides a financial incentive 
for programs to use the same tools for design, code, test, CM, documentation, etc., 
without "forcing" them to do so. This is important because there are cases where a 
program is REQUIRED to use a particular tool specified by the contract or by the customer 
(or PREFERS a non-standard tool), and they are able to do this by paying for the software 
out of program funds without impacting other programs who do not need the tool. 
Conversely, if a program can opt to use a "standard" tool, it will incur no extra cost over 
the normal charge to use other cost center software. 

T his Standard Software List has encouraged the use of a minimum common set of tools 
(sometimes more than one tool per category, however), economies of scale in centrally 
serving licenses, in consolidating training, and in purchasing ability (minimum number of 
suppliers, maximum leverage), without unnecessarily restricting programs. 

The number of tools on the standard list has varied since the list was started in 1994. It 
increased from 18 in 1994 to 26 in 1995, due to the consolidation of the three divisions and 
several cost centers into one, and the need to expand the standard list based on current tool 
use across all sites. These 26 were reduced to 22 in 1996, in an effort to streamline and 
further control overall costs. 
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IMPROVED PRODUCTIVITY AND QUALITY 


Raytheon has measured changes in organizational productivity and quality over the past 84- 
years, and the trends show continuously productivity and quality improving simul- 
taneously. The exact impact of increased use of software tools (or COTS tools) is hard to 
extract from the data, however, since there were several factors at work at the same time. 
These factors included: 

• hardware improvement (workstations, servers, networking, etc) 

• process improvement (standards, policies, procedures, inspections, etc) 

• and expanded training 

• in addition to the increased use of tools 

This was the same time period that the Raytheon software laboratories went from SEI Level 
1 to Level 3, achieved ISO 9001, documented overall process improvement savings (Ref. 
Ray Dion), and were awarded the IEEE Process Award. It is clear that better quality 
software was an important contributor, but not the only contributor, to this improvement. 
The two figures below show DSI/MM productivity improvement (Jan 1988 is normalized 
to 1.00), and quality improvement as measured by reduced Cost of Nonconformance (cost 
of rework), which can be considered a proxy for the number of defects. 


• Tools help improve productivity 
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• Tools help reduce defects & rework 
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OTHER OBSERVATIONS 

A closer empirical look at the data reveals some other interesting observations: 

• The total number of UNIQUE TOOLS in use at one time has increased from 4 to 62, 

however, (a) the measure of 4 in 1982 is incomplete and probably closer to 6 or 8, and 
(b) 14 of the 62 currently active are "almost dead" (very few users, usually legacy), 
leaving 48 truly currently active 

• The number of DEATHS (discontinued use of a tool) over this period was 22, or an 

average of about 2 per year. Another observation is that tools are often Slow to "die", 
as projects gradually discontinue use, but a few keep using them until the projects end. 

• The average LIFESPAN of the 22 tools that "died" was 4.4 years. 

• The percentage of tools that reached COMMON status (that is, reaching more than 25% of 

the engineers in the software lab) is 39% 

• The percentage of tools that never exceeded NICHE status (that is, never reaching 25% of 

the engineers in the software lab) was 43% 

• For the remaining 1 8% of the tools it is too soon to tell whether they will become 

COMMON tools, or will remain NICHE tools 
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FUTURE STABILIZATION 


From the data we've analyzed, and from observation of tool use by software engineers on 
many programs, it appears that the increase in the number of software tools in use may 
level off in the next few years at around 13 per engineer (about 2 Raytheon and about 1 1 
COTS), though it is too soon to determine this with confidence. This projection is shown 
in the figure below: 



OTHER LESSONS LEARNED 

• THE TRANSITION TO COTS TOOLS IS PERMANENT 

- ALTHOUGH COTS TOOLS ARE NOT A PANACEA 

• THERE IS A SHORT-TERM INCREASE IN COST, THOUGH (HOPEFULLY) A 

LONG-TERM COST DECREASE ~> THE QUESTION IS: HOW MUCH?? 

• TO CONTROL COSTS, SUPPORT “STANDARD” TOOLS WITH $ INCENTIVE 

- THESE CAN BE SUPPLIED BY A COST CENTER FOR “FREE” 

• TOOLS CAN HELP IMPROVE PRODUCTIVITY & QUALITY 

- EXACT AMOUNT IS HARD TO DETERMINE; THERE ARE OTHER FACTORS 

• EXPECT A REGULAR "CHURNING" OF THE EXACT TOOL MIX 

- THERE WILL BE REGULAR BIRTHS AND DEATHS (ABOUT 2-3 PER YEAR) 

- THERE ALWAYS SEEMS TO BE (AT LEAST) TWO OF EVERY TYPE OF TOOL 

• THE OVERALL BALANCE OF COTS VS INTERNAL WILL LEVEL OFF 
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FURTHER STUDY 


As was mentioned earlier, this is the first portion of a study of the overall cost of transition 
to COTS software engineering tools at Raytheon over a 13 year period. The second part of 
the study, which will focus on costs, is expected to be completed in 1997. 


TOM LYDON, LAURIE FISCHER, KARL GARDNER 
RAYTHEON RES, MAILSTOP T3MR8 
50 APPLE HILL DRIVE 
TEWKSBURY, MA 01876 

rtl@swl.msd.ray.com, lpf@swl.msd.ray.com, fkg@swl.msd.ray.com 
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NASA/GODDARD SOFTWARE ENGINEERING WORKSHOP 
Greenbelt, MD - December 4, 1 996 

TECHNOLOGY EVOLUTION : 

COTS Transition at Raytheon 1983-1996 


Tom Lydon, Laurie Fischer, Karl Gardner 
Raytheon Company 


Software Engineering Laboratory 

SUMMARY 
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OVER THE PAST 13 YEARS: 

• Data collected on use of SOFTWARE TOOLS 

• Increased NUMBER OF TOOLS per Engineer 

• Shift from INTERNAL tools to COTS tools 

• Driven by ECONOMICS and CUSTOMER REQTS 

- Shift from HW POWER to SW POWER 

- COTS are not a PANACEA 

• Increased PRODUCTIVITY and QUALITY 


WHAT DOES IT MEAN? WHAT’S IN THE FUTURE? 
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Software Engineering Laboratory 

BACKGROUND 
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• RAYTHEON RES Software Engineering Laboratory 

- 5 Major Sites in Massachusetts & Rhode Island 

- 600 (1983) to 1200 (1996) software professionals 

- SEI Level 3, ISO 9001, IEEE Process Award 1995 

• MAJOR BUSINESS AREAS 

- Air Defense, Transportation, Command & Control, 
Naval Systems, Radar, Technology 

• PRIMARY DEVELOPMENT ENVIRONMENT 

-VMS and Unix -1980s 
-Unix -1990-1995 
- Unix and NT - 1995+ 


Software Engineering Laboratory 

DATA 
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• DATA ON 84 TOOLS 1982-1996 

- 25 Raytheon-Developed 

- 59 COTS (mostly CASE) 

- Standard host editors and compilers not included 

• HISTORICAL DATA FOR EACH TOOL 

- Number of Users (as fraction of total lab) 

- Averaged on an annual basis 

- About 600 data points 

• EMPIRICAL ANALYSIS (Sort, count, compare) 

• PRODUCTIVITY AND QUALITY (Normalized) 
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Category 

CM 

CODING 

COST 

DATA 

DB 

DEFECT 

DESIGN 

DOCUM 

GIS 

GUI 

MAINT 

MGMT 

REQTS 

TEST 

TRACE 


Raytheon 

3 

0 

0 

2 

0 

6 

3 

2 

0 

0 

3 

2 

1 

1 

2 

25 


COTS 

3 
9 

4 

9 

5 
1 
4 

10 
2 
4 
0 
2 
4 
1 
1 

59 


TOTAL 


4 
11 

5 
7 
7 
12 
2 
4 

3 

4 

5 
2 
3 

84 
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Software Engineering Laboratory 

Overall Tool Use 
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• Use of Software Tools has Steadily Increased 



82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 


Software Engineering Laboratory 


Raytheon Tools 


Raytheon Electronic 
Systems 
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• Use of Raytheon-Developed Tools has Dropped Off 
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• Use of COTS Tools has Increased Dramatically 



82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 


Software Engineering Laboratory 


Raytheon vs COTS 


Raytheon Electronic 
Systems 


• Counterpoint: Swing from Internal to COTS Tools 
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Software Engineering Laboratory 

TOOL CATEGORIES 


Raytheon Electronic 
Systems 
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PROS & CONS 
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• COTS Tools are not a Panacea 


Internal 

Tools 


COTS 

Tools 


PROS CONS 


• Cost is fixed, not 

proportional to #users 

• Problem fixes usually faster 

• New capabilities can be 

added faster 

• More expensive to develop 

• More manpower required 

for maint & support 

• Usually lower quality tools 

• Slower initial availability 

• Less expensive to develop 

• Less manpower required 

for maint & support 

• Usually better quality tools 

• Faster initial availability 

• Cost increases almost linearly 

with #users 

• Problem fixes usually slower 

• New capabilities usually not 

added quickly 
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Software Engineering Laboratory 

COTS COST 
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• During Transition, costs initially go UP 



TIME ► 


Software Engineering Laboratory 

SUPPORT COST 


Raytheon Electronic 
Systems 

12/4/96 


• Costs Shifting Towards Software (esp. Maintenance) 
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Software Engineering Laboratory 

LICENSE REQUIREMENTS 


Raytheon Electronic 

Systems 
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• For large networks, fewer licenses required per user 
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STANDARDIZATION 


Raytheon Electronic 
Systems 

12/4/96 


RAYTHEON USES A STANDARD SOFTWARE LIST 

• PROGRAMS & DEPTS HAVE OWN PREFERENCES 

• STANDARDS PROVIDED “FREE” BY COST CENTER 

• FINANCIAL INCENTIVE TO USE STANDARD 

• MINIMIZES NUMBER OF SUPPLIERS 

• MAXIMIZES LEVERAGE WITH EACH SUPPLIER 

• SOME NON-STANDARD TOOLS STILL REQUIRED 

• 1994 - 18 STANDARD TOOLS (1 Division) 

• 1995 - 26 STANDARD TOOLS (Merged Divisions) 

• 1996 - 22 STANDARD TOOLS (Budget Constraints) 
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Software Engineering Laboratory Raytheon Electronic 

PRODUCTIVITY Systems 

12 / 4/96 

• Tools help improve productivity 



Software Engineering Laboratory Raytheon Electronic 

DEFECTS/REWORK System 

-- 12 / 4/96 

• Tools help reduce defects & rework 
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OTHER METRICS 

. _____ __ 12 / 4/96 

OTHER OBSERVATIONS IN THE DATA: 

• Total UNIQUE TOOLS in use increased from 4* to 62 

- 14 of 62 are almost dead, leaving 48 truly active 

• Number of DEATHS = 22, Average ~2 per year 

- Tools are often slow to actually die 

• Average LIFESPAN of 22 Dead Tools = 4.4 years 

• COMMON Tools (reaching >25% of Lab) = 39% 

• NICHE Tools (never reaching 25% of Lab) = 43% 

- Remaining 18% too soon to tell 



• Tool Saturation, Stabilized Levels 
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SoRwar* Engineering Laboratory Raytheon Electronic 

LESSONS LEARNED Sy5tems 

- . _ 12 / 4/96 

• TRANSITION TO COTS TOOLS IS PERMANENT 

- COTS TOOLS ARE NOT A PANACEA 

• SHORT-TERM COST INCREASE; LONG-TERM 

COST DECREASE -> HOW MUCH?? 

• SUPPORT “STANDARD” TOOLS WITH $ INCENTIVE 

- SUPPLIED BY COST CENTER “FREE” 

• WILL HELP IMPROVE PRODUCTIVITY & QUALITY 

- AIDED BY HW & PROCESS IMPROVEMENT 

• REGULAR CHURNING OF EXACT TOOL MIX 

- BIRTHS AND DEATHS (ABOUT 2-3 PER YEAR) 

- ALWAYS SEEMS TO BE TWO OF EVERY TYPE 

• USE OF COTS VS INTERNAL WILL LEVEL OFF 


ioftwara Engineering Laboratory Raytheon Electronic 

Systems 

12 / 4/96 


TOM LYDON 
RAYTHEON RES, T3MR8 
50 APPLE HILL DRIVE 
TEWKSBURY, MA 01876 
rd@swLmsd.ray.com 


SEW Proceedings 


201 


SEL-96-002 



SEL-96-002 


SEW Proceedings 


202 



Session 4: Reliability 




Identification of Failure-Prone Modules in Two Software System Releases 
N. Ohlsson and C. Wohlin, Linkoping University, Sweden 


Predicting Software Quality Using Bayesian Belief Networks 
M. Neil and N. Fenton, City University, London 


Data Collection Demonstration and Software Reliability Modeling For a 
Multi-Function Distributed System 
N. Schneidewind, Naval Postgraduate School 


Operational Test Readiness Assessment of an Air Force Software System: A 

Case Study 

A. Goel, Syracuse University, B. Hermann and R. McCanne, U.S. Air Force 


SEW Proceedings 


203 


SEL-96-002 




SEW Proceedings 


204 


SEL-96-002 



Identification of Failure-Prone Modules 

in 

Two Software System Releases 

Niclas Ohlsson and Claes Wohlin 
Dept, of Computer and Information Science 
Linkoping University, S-581 83 Linkoping, Sweden 
E-mail: (nicoh, clawo)@ ida.liu.se 

1 Introduction 

This paper presents a case study of fault and failure data from two consecutive releases of a large 
telecommunication system. In this context it is important to have clear interpretations of errors, 
faults and failures. Thus, we would like to make the following distinction between them. Errors 
are made by humans, which may result in faults in the software. The faults may manifest them- 
selves as failures during operation. Thus, faults can be interpreted as defects in the software and 
failures are the actual malfunction in an operational environment. In this paper we have used 
fault-prone modules to denote the modules that account for the highest number of faults disclosed 
during testing, while failure-prone modules is used to denote the modules accounting for the 
highest number of faults disclosed during the first office application and in operation. The general 
objective of the study is to investigate methods of identifying failure-prone software modules. 
Furthermore, the goal is to use the knowledge acquired to improve the software development 
process in order to improve software quality in the future. 

Some early results using parametric statistics have been reported in (Ohlsson and Alberg, 1996). 
The models have since been refined and analysed with non-parametric statistics (Ohlsson 
et al., 1996). Identification of fault-prone modules has also been addressed by other researchers 
(Khoshgoftaar and Kalaichelvan, 1995) and (Munson and Khoshgoftaar, 1992). Few, if any, stud- 
ies have exploited the opportunities to identify not only fault-prone modules, but also failure- 
prone modules which are the main concern of the user. There is also a general lack of studies 
investigating whether identification of fault-prone modules means that we actually also identify 
failure-prone modules. 

Another important issue is to establish when in the development phase we are able to identify 
modules which will be failure-prone in the operational phase. This paper investigates three differ- 
ent times for prediction: history (previous release), the design phase and the test phase. One 
important consideration is to address whether or not fault-prone modules during testing are fail- 
ure-prone during operation. If fault-prone does not imply failure-prone, then we may have to 
improve the test methods. 

The paper is organized as follows. In Section 2, an overview of the study is presented. Section 3 
discusses identification of failure-prone modules based on experience from a previous release, 
and Section 4 presents results using prediction models based oh design measures. In Section 5, 
results concerning identification of failure-prone modules based on test data are presented. 
Finally, some conclusions are given in Section 6. 
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2 Overview of study 

This paper is part of a long-term empirical study conducted at Ericsson Telecom AB with the 
objective of studying how identification of fault and failure-prone modules can be used to achieve 
cost-effective quality improvement. In release n of the system 130 modules have been analysed 
and in release n+1 232 modules have been investigated. Fault and failure data have been collected 
from functional testing, system testing, first office application (i.e. the first 26 weeks and a 
number of site tests) and operation. It was possible to trace 69 modules developed for release n 
that were modified in release n+1. Release n+1 is a major system revision. Data is currently being 
collected for release n+2. The modules are of the size of 1000 to 6000 lines of code each. 


Promising results concerning identification of fault-prone modules have been presented else- 
where, i.e. design measures were used to identify fault-prone modules (Ohlsson et al., 1996) and 
(Ohlsson and Alberg, 1996). The objective here is to study the identification of failure-prone 
modules based on fault and failure data as well as from design measures. In this paper we have 
used one failure as threshold for the dependent variable, i.e. modules with one or more failures are 
classified as failure-prone. The underlying analysis of design measures is based on ordinal analy- 
sis, as it allows for changing the threshold with regards to what are viewed as being fault- and 
failure-prone modules (Ohlsson et al., 1996). Actual threshold-values are not recommendations; 
thresholds should be determined in individual projects on the basis of, for example, the level of 
criticality of the system and market requirements. The primary objective of the thresholds as pre- 
sented in this paper is to illustrate the outcome when applying the methods for identification of 
failure-prone modules. 

The predictability of the different models is viewed in Contigency tables and the kappa coeffi- 
cients are calculated to measure the agreement in classification of the modules (Siegel and Castel- 
lan, 1988). The kappa coefficient is the ratio of the proportion of times that the classifications is 
correct to the maximum proportion of times that the classifications could be correct. If the classi- 
fications completely agree, then kappa=l; whereas if there is no agreement between the classifica- 
tions, then kappa-0. Kappa will assume -1 if there is a perfect missclassification. 

The study is divided into three parts: 


1. Identification of failure-prone modules using data from a previous release 

This part is aimed at investigating whether the information from release n concerning fault- 
and failure-prone modules is a good predictor of failure-prone modules in release n+1. More 
than 90 percent of the modules in release n had one or more faults. Therefore, it is infeasible to 
use one fault as a threshold. Thus, when fault-prone modules from release n is used to predict 
failure-prone modules in release n+1, a threshold of five faults is used for the independent var- 
iable as an indication of potential failure-prone modules. When failure-prone modules in 
release n are used as the independent variable, one failure is used as threshold. 


2. Identification of failure-prone modules using design measures 

The initial objective was to build prediction models in release n for identification of failure- 
prone modules based on design measures, which then should be validated with data from 
release n+1. Due to variation in quality between the two releases this was not possible. Instead 
design metrics were only evaluated within release n+1. Only the best design measure is 
reported here, as the main objective is to investigate different opportunities to identify failure- 


SEW Proceedings 


206 


SEL-96-002 



prone modules rather than evaluate which measures are the best predictors. To the best of our 
knowledge there exists no empirical evidence that complexity values higher than a specific 
threshold would indicate either fault- or failure-prone modules. However, there are results 
suggesting relative stable distribution in line with the Pareto principle (Ohlsson et al., 1996). 
Therefore, the threshold is based on the percentage of failure-prone modules in release n+1. 
That is, 29 percent of the modules in n+1 had one or more failures. Hence, this percentage 
value is used as a threshold for the design measures. 

3. Identification of failure-prone modules from fault-prone modules 

The objective of this part is to investigate whether the fault-prone modules identified in release 
n and n+1 are good indicators of failure-prone modules in the two releases. This means that 
fault data from testing is used to predict failure-proneness during operation. The rationale for 
selecting thresholds is the same as in part 1. 

To summarize, the main difference is when prediction can be made. The three parts imply three 
different points of time in a project, namely: project start (part 1), design phase (part 2), and test- 
ing phase (part 3). It is important to remember that the sooner we are able to identify modules 
which are likely to be failure-prone, the sooner we can take appropriate measures to deal with 
them. For example, we can allocate the best people, intensify inspections or take other special 
improvement measures. 

3 Failure-prone modules from history 

For software systems, it is normal practice that a system is regularly upgraded and released in new 
versions. This implies that some parts of the system are the same in different releases. This infor- 
mation can be used to apply experience from one release to the next release or following releases. 
In this empirical study, the hypothesis is that fault- or failure-prone modules in release n are likely 
candidates for being failure-prone in release n+1. It was possible to trace 69 modules developed 
for release n that were modified in release n+1. The data from the historical analysis is shown in 
Table 1. It should be noted that only four modules were failure-prone in release n, see analysis A, 
while 18 modules were failure-prone in release n+1. 

To evaluate the goodness of the predictions, the prediction errors must be considered. This 
includes two different types of errors: failing to identify failure-prone modules and identification 
of modules as failure-prone when they are not. These are hereafter referred to as errors of type I 
and II respectively. It should be noted that a correct identification means actually pin-pointing a 
certain module correctly. 

To evaluate the goodness of the predictions, the prediction errors must be considered. This 
includes two different types of errors: failing to identify failure-prone modules and identification 
of modules as failure-prone when they are not. These are hereafter referred to as errors of type I 
and II respectively. It should be noted that a correct identification means actually pin-pointing a 
certain module correctly. 
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TABLE 1. Failures identified in release n+1 based on release m 



Analysis A a 


Analysis B b 


Analysis C 0 



Threshold**! 


Threshold=5 


Thresholds 



Failure(n) 


Fault(n) 


Fanlt+ 

Failure(n) 


Actual 

F 

NotF 

F 

NotF 

F 

NotF 

Failure-prone(n+ 1 ) 
(18 observation) 

4 

14 

14 

4 

15 

3 

Not Failure-prone(n+ 1 ) 
(51 observations) 

0 

51 

28 

23 

28 

23 

Total observations 

4 

65 

42 

27 

43 

26 

Misclassifications 
of type I and n 

78% (14/18) 

0% (0/51) 

28% (4/18) 

55% (28/51) 

17% (3/18) 

55% (28/51) 

Overall misclassifications 

20% (14/69) 


46% (32/69) 


45% (31/69) 



a. Kappa 030 

b. Kappa 0.16 

c. Kappa 032 


Analysis A in Table 1 illustrates that even though the type I error is as high as 78%, there is no 
type II error. This means that the modules that are failure-prone in release n are all failure-prone 
in release n+1. Possible explanations for this are die actual type of failure and late erroneous fault 
correction in test. 

For analyses B and C, we have used five faults as a threshold for die independent variable. It has 
earlier been suggested (Khoshgoftaar and Kalaichelvan, 1995) that this should be used as thresh- 
old for fault-prone modules. The threshold could therefore indicate failure-proneness. Using one 
fault is not reasonable since this would identify 63 modules as being failure-prone. Even with a 
threshold of five faults in analysis B as many as 61 percent (42/69) of the modules are identified in 
release n as failure-prone. However, only 78 percent (14/18) of all the failure-prone modules in 
release n+1 are identified. Therefore, fault-prone modules in release n are poor predictors of fail- 
ure-prone modules in n+1. This is also true for analysis C. 

Another possible alternative would be to select a threshold based on the percentage of failure- 
prone modules in release n+1, i.e. assuming that this proportion of fault- and failure-prone mod- 
ules will be stable over later releases. The number of potential failure-prone modules would be 
more realistic using 26 percent (18/69) as a threshold. However, only 28 percent of the failure- 
prone modules would be identified. This also holds for analysis C. Therefore, the two models in 
analyses B and C are not applicable. 

4 Failure-prone modules from design measures 

Earlier studies (Ohlsson et al., 1996) have indicated that models built on design metrics are worth- 
while when the total number of faults and failures are considered as the dependent variable. Thus, 
it is reasonable to try this approach for failure-prone modules. In this study, fourteen different 
design measures are used to build prediction models for release n+1. Spearman’s correlation coef- 
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ficient (Siegel and Castellan, 1988) was used for a first analysis. All potential variables have low 
correlation values (below 0.35). There was, however, a rather low correlation among some of the 
variables, hence it could be possible to improve the model by combining the variables into more 
complex models. Multiplicative aspects of the potential variables will be investigated in later stud- 
ies. In this particular case, the best design measure predictor was IS, which is the number of 
input-signals for a module in the design. The result was later compared with lines of code, which 
was found to be doing even worse. 

It has been suggested that prediction models should first be developed for one release, validated in 
the succeeding release, and then applied in the third release. However, the quality of the two 
releases varied widely, and it was therefore not possible to do so in this study. From a modelling 
point of view, the number of failure-prone modules in release n was too few. Instead, the explana- 
tory ability of design metrics was evaluated by building the best possible model based on data in 
release n+1. The results shown in Table 2 are based on a threshold of one failure, which corre- 
sponds to 29 percent of the modules. 


TABLE 2. Failures identified in release n+1 based on IS(n+l). 


Analysis* 


IS(n+l) 


Actual 

F 

NotF 

Failure-prone(n+ 1 ) 

28 

39 

(67 observation) 


- 

Not Failure-prone(n+ 1 ) 
(165 observations) x 

39 

126 

Total observations 

67 

165 

Misclassificadons 

58% (39/67) 

24% (39/165) 

Overall misclassifications 

34% (78/232) 



a. Kappa 0.18 


From Table 2, it can be seen that the explanatory ability is unsatisfactory, i.e. the misclassification 
is too high, including a large proportion of both type I and II errors. This, in combination with the 
fact that the quality of the two releases differed, suggests that more complete models should be 
investigated, for example including verification effort and quality. 

5 Failure-prone modules from fault-prone modules 

The data from die testing phase can be used for both releases to predict the failure-prone modules. 
The problem with choosing relevant thresholds, discussed in respect to part 1, is relevant for this 
part, too. The results of die analyses are shown in Table 3, using a threshold of five faults for the 
independent variable. 
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TABLE 3. Failures identified based on faults disclosed during testing of release n and n+1 respectively. 



Analysis n a 



Analysis n+l b 



Fault(n) 



FaultOo+l) 


Actual 

F 

NotF 

Actual 

F 

NotF 

Failure-prone(n) 
(13 observation) 

5 

8 

Failure-prone(n+ 1 ) 
(67 observation) 

47 

20 

Not Failure-prone(n) 
(117 observations) 

77 

40 

Not Failure -prone(n+ 1 ) 
(165 observations) 

102 

63 

Total observations 

82 

48 

Total observations 

147 

83 

Misclassifications 

62% (8/13) 

66% (77/117) 

Misclassifications 

30% (20/67) 

62% (102/165) 

Overall misclassifications 

65% (85/130) 


Overall misclassifications 

53% (122/232) 



a. Kappa -0.08 

b. Kappa 0.06 


The misclassification is also too high in this analysis. This means that modules that are fault- 
prone during testing are not failuie-prone. A possible explanation is that other types of defects are 
discovered in operation, such as performance problems, that are difficult to test. This explanation 
is supported by experienced developers from Ericsson. This could also explain die result in part 1. 
A possible explanation of die fact that failure-prone modules in n are failure-prone in n+1 could 
be that modules which are critical from a capacity perspective in release n, will remain so in 
release n+1. The results indicate the need for a better understanding of the types of defects that 
result in failures and the types of the failures themselves. The results also stress the need to iden- 
tify factors causing the defects which result in failures. Increased understanding is essential for 
quality improvement 

6 Conclusions 

In this paper we have investigated the opportunity to predict failure-prone modules based on fault 
and failure data from two succeeding releases, design metrics, as well as test data. The study 
revealed that failure-prone modules in release n are failure-prone in n+1. Other suggested inde- 
pendent variables are poor predictors of failure-proneness. However, this is not the same as say- 
ing that they do not explain any of the variation. It only means that on their own they are poor 
explanatory factors. Instead, the study suggests that methods that combine these different inde- 
pendent variables are needed. 

In this study, we have addressed two consecutive releases of a software system. This is an impor- 
tant aspect as in most cases it is not possible to both build, validate and use a prediction model 
within one release. It is, thus, important to investigate how to build models in one release, validate 
die model in the next release and then use the model in the third release. The transferability of a 
model between a software system’s releases is crucial to success in the mission of identifying fail- 
ure-prone modules prior to the operational phase. 

A major problem with predictions is that failures are dynamic, hence it may be difficult to identify 
failure-prone modules using static measures. This is an issue which has to be further studied. One 
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potential solution would be to take the use of modules into account when predicting failure- 
proneness. This would allow for capturing the dynamic aspects of usage in the independent varia- 
ble. 

Another important issue which has been addressed here is the point of time when we are able to 
identify failure— prone modules. To improve the usefulness of the predictions, they should prefera- 
bly be done at an early stage. In this study, we have focused on data from the previous release, the 
design and the test phase. The knowledge from the previous release is important in identifying 
failure-prone modules, but this is not a feasible approach for new modules. Thus, it is very impor- 
tant to find early indicators of failure-proneness, since this is the only way to enable us to address 
the problem within the same release. 

Models which identify failure-prone modules are important not only in enabling prediction dur- 
ing the operational phase, but also as a planning and control tool during development. Managers 
may use these models to improve die resource allocation for design, both in terms of effort and 
experience. Furthermore, knowing which modules are most likely to be failure-prone in operation 
suggest that the modules will be tested and inspected differendy. Therefore more attributes need 
to be considered and incorporated in die models, for example verification effort and quality, in 
line with Fenton et al. (Fenton et al., 1995), to explain the variation and to be able to apply the 
models in subsequent releases. 

Future work should not only aim at building these more complete models, but also aim at investi- 
gating additative and multiplicative aspects of design measures and measures from different 
phases, in order to gain more knowledge about how such a component fits into a more complete 
model. The results in this study also suggest that prediction models that are only based on test 
data will have limited applicability in real projects aiming at addressing operational issues. 
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Abstract 

In the absence of an agreed measure of software quality the density of 
defects has been a very commonly used surrogate measure. As a result 
there have been numerous attempts to build models for predicting the 
number of residual software defects. Typically, the key variables in these 
models are either size and complexity metrics or measures arising from 
testing information. There are, however, serious statistical and 
theoretical difficulties with these approaches. Using Bayesian Belief 
Networks we can overcome some of the more serious problems by 
taking account of all the diverse factors implicit in defect prevention, 
detection and complexity. 


1. Background 

For the last 20 years the software engineering community has spent much effort in 
trying to answer the question, "Can we predict the quality of our software before we 
use it?". There are literally scores of papers, articles and reports advocating 
statistical models, metrics and solutions which puiport to answer this question. 
Generally, efforts have tended to concentrate solely on one of the following three 
problem perspectives: 

a) Predicting the number of defects in the system using software size and 
complexity metrics 

The earliest study of the relationship between defects and complexity appears to 
have been [Akiyama,1971] which was based on a system developed at Fujitsu, 
Japan. It is typical of many regression based "data fitting" models which became 
common-place in the literature (such as [Ferdinand 1974], [Lipow 1982], [Gaffney 
1984], [Basili and Perricone 1984], [Shen 1985], [Compton and Withrow 1990], 
[Moller and Paulish 1993]). The study showed that linear models of some simple 
metrics provide reasonable estimates for the total number of defects d (the 
dependent variable) which is defined as the sum of the defects found during testing 
and the defects found during two months after release. Although there is no 
convincing evidence to show that any of the hundreds of published complexity 
metrics are good predictors of defect density, there is a growing body of evidence 
that some of these metrics may be useful in outlier analysis (especially when 
grouped together) [Bache and Bazzana 1993] — they can be used to predict which 
of a set of modules is likely to be especially defect-prone. 
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b) Inferring the number of defects from testing information 

Some of the most promising local models for predicting residual defects involve very 
careful collection of data about defects discovered during early inspection and 
testing phases. A notable example of this is reported by the IBM NASA Space 
shuttle team [Keller 1992]. Another class of testing metrics that appears to be quite 
promising for predicting defects is the class of so called test coverage measures. 
[Fenton and Pfleeger 1996]. For a given strategy and a given set of test cases we 
can ask what proportion of coverage has been achieved. The resulting metric is 
defined as the Test Effectiveness Ratio (TER) with respect to that strategy. Clearly 
we would expect defect rate to decrease as the values of these metrics increases. 
[Veevers and Marshall 1994] report on some defect and reliability prediction models 
using these metrics which give quite promising results. 

c) Assessing the impact of design or process maturity on defect counts. 

There are many experts who argue that the quality of the development process is 
the best predictor of product quality. The simplest metric of process quality is the 
5-level ordinal scale SEI Capability Maturity Model ranking. Despite its widespread 
popularity, there is no convincing evidence to show that higher maturity companies 
generally deliver products with lower residual defect rate than lower maturity 
companies. Nevertheless, this seems to be a widely held assumption and is 
therefore important in explaining and predicting defects. 


2. The need to take account of diverse factors 

Despite the many efforts described above there appears to have been little overall 
improvement in the accuracy of the predictions made using these models (if 
predictions are formally made at all) or indeed whether the models make sense. 
Broadly speaking there are a number of serious statistical and theoretical difficulties 
that have caused these software quality prediction problems ([Neil 1992] provides 
explicit criticisms of many of the models). To avoid these problems we need to take 
account of all the diverse factors implicit in defect prevention, detection and 
complexity. 

Perhaps the most critical issue in any scientific endeavour is agreement on the 
constituent elements or variables of the problem under study. Models are developed 
to represent the salient features of the problem in a systemic fashion. This is as 
much the case in physical sciences as social sciences. For instance, in macro- 
economic prediction we could not predict the behaviour of an economy without an 
integrated, complex, model of all of the known, pertinent variables. Choosing to 
ignore or forgetting to include key variables such as savings rate or productivity 
would make the whole exercise invalid and meaningless. Yet this is the position that 
many software practitioners are in - they are being asked to accept simplistic 
models which are missing key variables that are already known to be enormously 
important. Predicting the number of defects discovered based on lines of code 
alone is as much use as predicting a person’s IQ from a knowledge of their shoe 
size. 

Our view is that the isolated pursuit of these single issue perspectives on the quality 
problem are, in the longer-term, fruitless. The solution to many of the difficulties 
presented above is to develop prediction models that unify the diverse software 
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quality prediction models. This unification will help produce new systematic models 
that better represent the complex relationships inherent in software engineering. 
Only when such unified models are developed will statistical experimentation and 
then practical use be warranted. 

As well as facing up to the complexity inherent in software engineering we must also 
recognise that modelling the actions of the designer and manager are crucial if we 
are to predict the quality of the final product. Again and again experience dictates 
that it is good managers and designers that determine the difference between 
failure and success. However researchers have tended to ignore the issue of 
human intervention even though we know it is the key variable in software design. A 
consequence of this is that subjectivity and uncertainty is all pervasive in software 
development. Project managers make decisions about quality and cost using best 
guesses; it seems to us that will always be the case and the best that researchers 
can do is a) recognise the fact and b) improve the ‘guessing’ process. 

The results of inaccurate modelling and inference is perhaps most evident in the 
debate that surrounds the ‘Is Bigger Better?’ dilemma. This is the phenomenon that 
larger modules have lower defect densities [Basiii and Perricone 1984] and [Shen 
1985]. [Moller and Paulish 1993] provide further evidence, and also examined the 
effect of modifications and reuse on defect density. Similar experiences are 
reported by [Hatton 1993, 1994], Basiii and Perricone argued that this may be 
explained by the fact that there are a large number of interface defects distributed 
evenly across modules, and that larger modules tend to be developed more 
carefully. Others have mentioned the possible effects of testing. 

The notion that larger modules have lower defect density is surprising because it 
questions the whole edifice of problem and design decomposition so central to 
software engineering. It suggests that building bigger modules will result in less 
defects overall. To act on these results would mean throwing away much of what is 
being advocated in structured, object-oriented and formal design - ‘Why should we 
apply decomposition when it doesn’t improve quality?’. Post-hoc explanations 
cannot easily dismiss the uncomfortable significance of this result. 


3. Bayesian Belief Networks (BBNs) 

Achieving the above modelling challenges appear onerous when one considers the 
tools previously available to researchers and practitioners. They have had to rely on 
the power of classical statistical analysis tools, such as regression, discriminant 
analysis and correlation. Classical methods demand simple linear structures and a 
wealth of data so often missing in software engineering. These methods have 
severely restricted the scale of problems that could be tackled. However, a relatively 
new but rapidly emerging technology has provided an elegant solution enabling us 
to push back the boundary of the problems that can be attacked: Bayesian Belief 
Networks (BBNs) [Pearl, 1988]. 

A BBN is a graphical network that represents probabilistic relationships among 
variables. BBNs enable reasoning under uncertainty and combine the advantages 
of an intuitive visual representation with a sound mathematical basis in Bayesian 
probability. With BBNs, it is possible to articulate expert beliefs about the 
dependencies between different variables and to propagate consistently the impact 
of evidence on the probabilities of uncertain outcomes, such as ‘future system 
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reliability’. BBNs allow an injection of scientific rigour when the probability 
distributions associated with individual nodes are simply ‘expert opinions’. A BBN 
will derive all the implications of the beliefs that are input to it; some of these will be 
facts that can be checked against the project observations, or simply against the 
experience of the decision makers themselves. There are many advantages of 
using BBNs, the most important being the ability to represent and manipulate 
complex models that might never be implemented using conventional methods. 
Because BBNs have a rigorous, mathematical meaning there are software tools that 
can interpret them and perform the complex calculations needed in their use. The 
specific tool used here is Hugin Explorer [Hugin 1996], which provides a graphical 
front end for inputting the BBNs in addition to a computational engine for the 
Bayesian analysis. 


4. The Defect Density BBN 



Figure A - BBN Topology 


The topology of the Defect Density BBN is shown in Figure A. The ellipses 
represent ‘chance’ variables, the rectangles show the ‘decisions’, the diamonds 
represent ‘utility’ (cost/benefit) variables and the arrows show the flow of information 
or cause-effect finks. The variables represented are measured on ordinal, 
subjective, scales. Subjective scales are used to make the model simpler; there is 
no theoretical impediment to modelling ratio scales and continuous variables. Each 
variable has the following states: very-high, high, medium, low, very low or none 
(optional for some variables). The probabilities attached to each of these states is 
determined from an analysis of the literature or common-sense assumptions about 
the direction and strength of relations between variables. 
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The BBN can be explained in two stages. The first stage covers the life-cycle 
processes of specification, design or coding and the second stage covers testing. In 
Figure A problem complexity represents the degree of complexity inherent in the set 
of problems to be solved by development. We can think of these problems as being 
discrete functional requirements in the specification. Solving these problems 
accrues benefits to the user. At the specification stage a project manager assesses 
the complexity of the problems and assigns design effort accordingly. The skill with 
which this is done is denoted by the variable: assessor skill— specification. This 
assessment process could involve formal measurement, using function points for 
example, subjective judgement or some combination of both. Assessing the 
complexity of the problem accrues an assessment cost— specification. Any mis- 
match between the problem complexity and design effort is likely to cause the 
introduction of defects and a greater design complexity. Hence the arrows between 
design effort, problem complexity, introduced defects and design complexity. For 
example an optimistic project manager may allocate a small amount of design effort 
to a complex problem simply because the complexity was underestimated during 
assessment of the specification. Applying design effort incurs a design cost. 

In Figure A the testing stage follows the design stage. Here design complexity is 
assessed by the project manager in order to gauge the amount of testing effort to 
allocate. This decision is represented by the assessor skill — testing variable. This is 
similar to the specification assessment process in that the project manager may 
measure the design complexity directly using appropriate static or dynamic metrics 
or will make a guess based on intuition and experience. The extent to which either 
of these measure precisely the actual design complexity will be uncertain. Doing the 
assessment will incur assessment cost— testing. Ideally any testing effort allocated 
would match that required by the design complexity. However in practice the testing 
effort actually allocated may be much less, whether by intent or accident. The mis- 
match between testing effort and design complexity will influence the number of 
defects detected, which is bounded by the number introduced. Fixing these defects 
during testing incurs a de-bugging cost. The difference between the defects 
detected and defects introduced is the residual defects count. Any residual defects 
will be released with the product and may increase the maintenance costs, incurred 
by the user and maintainer. 
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Figure B - Is Bigger Better? Dilemma 

Figure B shows the execution of the defect density BBN model for the ‘Is Bigger 
Better?’ dilemma using the Hugin Explorer tool. Each of the decision and chance 
variables is shown as a window with a histogram of the predictions made based on 
the facts entered. The scenario runs as follows. A very complex problem is 
represented as a fact set at ‘very high’. Assume the project manager performs no 
precise estimation on this so the assessment skill— specification variable is set to 
‘no measurement’. This results in the allocation of ‘high’ design effort, rather than 
‘very high’ commensurate with the problem complexity. The model then propagates 
these ‘facts’ and predicts the design complexity with a peak at ‘high’ with probability 
of approx. 90%. The introduced defects follows a modal distribution shape with a 
peak at ‘medium’ with probability of around 27%. We may also find that the project 
manger is again optimistic. He does not measure the design complexity and 
allocates a ‘low’ level of testing effort. This results in low levels of defects detected, 
with approximately 60% probability of finding no defects at all. From the predicted 
values for detected and introduced defects is propagated to predict the residual 
defects. Residual defects peaks at ‘low’ with around 40% probability but with a 
significant tail towards medium and high numbers of residual defects. 

From the model we can see a credible explanation for observing large ‘modules’ 
with lower defect densities. Under allocation of design effort for complex problems 
results in more introduced defects and higher design complexity. Higher design 
complexity requires more testing effort, which is unavailable, leading to less defects 
being discovered than are actually there. Dividing the small detected defect counts 
with large design complexity values will result in small defect densities! The model 
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explains the “is bigger better” phenomena without ad-hoc explanation or 
identification of ‘outliers’. 


5. The Way Forward 

At a general level we can see how the use of BBNs and the defect density model 
provide a significant new approach to modelling software engineering processes 
and artefacts. The dynamic nature of this model provides a way of simulating 
different events and identifying optimum courses of action based on uncertain 
knowledge. These benefits are reinforced when we examine how the model 
explains known results, in particular the ‘Is Bigger Better?’ dilemma. Our new 
approach shows how we can build complex webs of interconnection between 
process, product and resource factors in a way hitherto unachievable. We also 
should how we can integrate uncertainty and subjective criteria into the model 
without sacrificing rigour and illustrate how decision-making throughout the 
development process influences the quality achieved. 

The benefits of this new approach are: 

• it is more useful for project management than outlier analysis and classical 
statistics 

• it incorporates current research ideas and experience 

• it can be used to train managers and enable comparison of different decisions by 
simulation and what-if analyses 

• it integrates a form of cost and quality forecasting 

So far we have explained historical results rather than real projects. Much work 
remains to be done to: 

• provide guidelines on how to apply the approach to specific situations 

« develop a modular approach where whole development processes can be 
modelled using linked BBNs 

« assess the validity of the model by testing its predictions on real projects 

We have embarked on the above tasks in the area of safety cases in the CEC 
ESPRIT project SERENE (Safety and Risk Evaluation using Bayesian Nets) and will 
be improving it for statistical software process control in the IMPRESS (Improving 
the Software Process using Bayesian Nets) project funded by UK EPSRC. We will 
be applying the defect density BBN model to a project with Ericsson Radio Systems 
in Sweden and are working with the UK Defence Research Agency (DRA) to 
develop BBNs for procurement processes. 
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Published defect densities 

Baseline systems using defect 
density (Defects/KLOC) 
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industrial 

indices: 


USA and Europe: 

Density 

5-10 

Japan: 

< 4 

Motorola: 

1-6 

Pfleegar et al: 

0.30 

Schlumberger: 

0.13 

Cleanroom: 

2.70 

Ostrolenk and Neil: 

1.30 
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Lines of Code Models 



Akiyama 

D = 4.86 + 0.018 L D: Defects 

Gaffney L: Lines of Code 

D = 4.2 + 0.0015 L 3 ' 4 

(optimum module size 877 LOC) 

Compton and Withrow 

D = 0.069 + 0.00516 L + 0.00000047 L 2 

(optimum module size 83 Ada LOC) 
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Problems 


Using any single Quality Model 
will be grossly misleading 

• defects not solely caused by design 
complexity or size 

• models ignore complexity of problem 

• if you don’t test you don’t find defects 

• competent people produce ‘better’ designs 

• we cannot trust defect density figures ; ■ I 




Solution 


• Need to better reflect ‘difficulties’ of 
quality management 

• Synthesise partial quality models 


- include elements from each approach 

- explain existing empirical results 

- consistent with ‘good’ sense 

• Multivariate and ‘messier’ 


• Cope with uncertainty and subjectivity 
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Bayesian Belief Networks 
(BBNs) 

Consists of three major components: 

- graphical model 

- conditional probability tables modelling 
prior probabilities and likelihoods 

- Bayes’ theorem applied recursively to 
propagate data through network 

Graph topology models cause-effect 
reasoning structures 




Propagation 


Data entered 
E updates parent, C 


C updates neighbours, 
A, F 


root nodes, A, C update 
children, B, D, E, F 


•* i 


A 


Algorithm based on award winning work 
by Lauritzen and Spiegeihalfer 
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Conclusions 


• Equivocal results when partial quality 
models applied 

• New model is synthesis of partial 
models 


• Bayesian Belief Networks offers 
technology to implement new model 


Coherent model of expertise - empirical 
validation remains to be done 


■ • • v ' • 
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Popular software reliability models treat software as a single entity and model the failure process in accordance with this 
perspective. However in a MFDS, with multiple clients and servers, this approach is not applicable. Consequently a software 
reliability model was developed that takes into account the fact that not all software defects and failures result in system failures in 
a client-server system. In this model there are critical clients and servers: clients and savers with critical functions (e.g., network 
c ommuni cation) drat must be kept operational for die system to survive. There are also non-critical clients and servers with non-critical 
functions (e.g, email). These clients and servers also act as backups for critical clients and savers, respectively. The system does not 
fail unless all non-critical clients fail and one a more critical clients fail, a all non-critical savers fail and am or more critical servers 
fail. 


The Marine Corps Tactical System Support Activity (MCTSSA) required the development of such a model because the MFDS 
is die type of system that is developed by this agency, where valid predictions of software reliability are important fa evaluating the 
reliability of systems that will be deployed in the field. 

cnLTENT -SERVER SOFTWARE RELIABILITY PREDICTION 

This section provides an introduction to client-server software reliability prediction and provides definitions of sevaal important 
toms. Too often die assumption is made, when doing software reliability modeling and prediction, that the software involves a single 
node. The reality in today's increasing use of multi node client-server systems is that there are multiple entities of software that execute 
mi multiple nodes that must be modeled in a system context, if realistic reliability predictions and assessments are to be made. Fa 
example if there are N e clients and N, servers in a client-serva system, it is not necessarily the case that a software failure in any of 
dm N e clients a N, servers , which causes the node to fail, will cause the system to fail. Thus, if such a system were to be modeled 
as a single entity, the predicted reliability would be much Iowa than the true reliability because the prediction would not account 
fa criticality and redundancy. The first facta accounts fa die possibility that the survivability of some clients and servers will be 
more critical to continued system operation than others, while the second facta accounts fa the possibility of using redundant nodes 
to allow for system recovery should a critical node fail To address this problem, we must identify which nodes — clients and servers - 
are critical and which are not critical. We use the following definitions: 

Node: A hardware element on a network, generally a computa, that has a network interface card installed [NOV95]. 

Client: A node that makes requests of servos in a network a that uses resources available through die servos [NOV95]. 

Server: A node that provides some type of network service [NOV95]. 

Cheat-Server Computing: Intelligence, defined either as processing capability a available information, is distributed across multiple 
nodes. There can be various degrees of allocation of computing function between the client and servo, from one extreme of an 
application running on the client but with requests fa data to the server to the otho extreme of a servo providing centralized 
processing (e.g, mail servo) and sharing information with the clients [NOV95], The terms client-server computing and distributed 
system are used synonymously. 
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Critical function: An application function that must operate for the duration of the mission, in accordance with its requirement, in 
order for the system to achieve its mission goal (e.g., the requirement states that a military field unit must be able to send messages 
to headquarters and receive messages from headquarters during the entire time that a military operation is being planned). This type 
of function operates in the network mode, which means that the application requires more than a single client to perform its function; 
thus client to server or client to client communication is required. 

Non-critical function: An application function that does not have to operate for the duration of the mission in order for the system 
to achieve its mission goal (e.g, it is not necessary to perform word processing during the entire time that a military operation is being 
planned). Often this type of function operates in the standalone mode, which means that a single client performs the application 
function; thus client to server or client to client communication is not required, except for the possible initial downloading of a program 
from a file server or the printing of a job at a print server. 

Critical clients and servers: Nodes with critical functions, as defined above. These nodes must be kept operational for the system 
to survive, either by incurring no failures or by reconfiguring non-critical nodes to operate as critical nodes. 

Non-crttical clients and servers: Nodes with non-critical functions, as defined above. These nodes also act as backups for the critical 
nodes, should the critical nodes fail. 

Software Defect: Any undesirable deviation in the operation of the software from its intended operation, as stated in the software 
requirements. 

Software Failure: A defect in the software that causes a node (either a client or a server) in a client-server system to be unable to 
perform its required function within specified performance requirements (i.e., a node failure). 

System Failure: The state of a client-saver system, which has experienced one or more node failures, wherein there are insufficient 
numbers and types of nodes available for the system to perform its required functions within specified performance requirements. 

MODEL FORMULATION 

By defining System Nodes, Node Failure Probabilities, and Failure States, the user will be able to compute the probability of 
system failure given that a node failure has occurred. Start by defining the number and type ofMFDS nodes as follows: 

System Nodes 

N w : Number of Critical Client nodes. 

N*.(t): Number ofNon-Critical Client nodes. 

N m : Number of Critical Server nodes. 

N„(t): Number ofNon-Critical Server nodes. 

The sum of these nodes should equal the total number of nodes: 

N=N cc +N«(t)+N c .+N - (t). (1) 

As long as the system survives, and N„ are constants because a failure of a critical node will result in a non-critical node 
replacing it, if there is a non-critical node available. A change in software configuration may be necessary on the forma non-critical 
node in order to run the failed critical node's software. If a critical node fails, the system fails, if there are no non-critical nodes 
available on which to run the failed critical node's software. 

In contrast, N K (t) and N„(t) are decreasing functions of operating time because these nodes replace failed critical nodes, and 
are not themselves replaced, where HJO) is the number of non-critical clients and N„(0) is the number of non-critical servers at the 
start of system operation, respectively. In addition, if a non-critical node fails, the function that had been operational on the failed node 
can be continued on another node of this type and the system can continue to operate in a degraded state. When either a non-critical 
node replaces a critical node or a non-critical node fails, N^(t) or N„(t) is decreased by one, as appropriate. 

Node Failure Probabilities 


We must also account for the following node failure probabilities: 
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p w : probability of a software defect causing a critical client node to fail, 
p^: probability of a software defect causing a non-critical client node to fail. 

P« : probability of a software defect causing a critical server node to fail. 

Pn* : probability of a software defect causing a non-critical server node to fail. 

These probabilities are important to know individually in the analysis; they are also important in the computation of the probability 
of system failure . 

The general function for the probability of system failure, given a node failure , is the following: 

P^/node failsHfCN^, p^, p^, p*, N„, p J (2) 

Equation (2) means that the probability of a system failure, given a node failure , is dependent on the four node counts and the 
corresponding four Mure probabilities. The four probabilities are computed from data that is derived from a defect database (defect 
descriptions, defect classifications, and administrative information) as follows: 


Pcc = L/oc(I)/D, where 4.(1) is the critical client node failure count in interval I; (3) 

P»c = IXc(iyD> where 4© is the non-critical client node failure count in interval I; (4) 

Pc == L4©/D> where 4© is the critical server node failure count in interval I; (5) 

p M =£4©/D, where 4© is the non-critical server node failure count in interval I; (6) 

and the total defect count across all intervals is D=£id©, (7) 


where I is the identification of an interval of operating time of the software and d© is the total defect count in interval I. 

In a specific application, Boolean expressions (i.e. expressions containing AND, OR, and NOT, logic operations) are used to 
search the defect database and extract the failure counts (e.g., 4©) that are used to compute equations (3)-(6). These expressions 
specify the conditions that qualify a defect as a node failure (e.g., defect that is a General Protection Fault that affects network 
operations on a Windows-based system). 

Failure States 

Next we need to know that at a given instant in test or operational time t, a MFDS may be in one of three failure states that 
pertains to the survivability of the system, as follows, in decreasing order of capability: 

Degraded - Type 1: A software defect in a non-critical node causes the node to fail. As a result, the system operates in a degraded 
state, with one less non-critical node. No reconfiguration is necessary because the failed node is not replaced. 

Degraded - Type 2: A software defect in a critical node causes the node to fail. As a result, the system operates in a degraded state, 
but one that is more severe than Type 1 , because there would be both a temporary loss of erne critical node during reconfiguration and 
a permanent loss of one non-critical node (i.e., one of the non-critical nodes takes over the function of the failed critical node). Under 
certain conditions — see Table 1 — this type of node failure can cause a system failure. 

The current version of the model assumes that node failures are not recoverable on the node where the failure occurred, during 
the mission. The next version of the model will contain a repair function to account for the case where a node failure is repaired and 
the node is put bade into operation during the mission. 

System Failure: The system fails under the following conditions: 1) all non-critical clients fail and one or more critical clients fail, 
or 2) all non-critical servos fail and one or more critical servers fail. The reason for this failure event formulation is that, in the event 
of a failed critical node, a non-critical node can be substituted, possibly with a different software configuration. However, if all non- 
critical clients (servers) fail, and one or more critical clients (servers) fail, there would be no non-critical clients (servers) left to take 
over for the failed critical clients (servers). 
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The failure states are summarized in Table 1. 


System Failure Probability 

Having equations (3)-(6) for the node failure probabilities in hand, the model applies than to computing 
the probability of system failure ~ equation (12). The intermediate equations leading up to equation (12) follow: 


The probability that one or more critical clients N„. fail, given that the software fails, is: 

Pcc=l-(l-Pcc) N “ (8) 

The probability that all non-critical clients N K (t) have failed by time t, given that the software fails, is: 
P«(tHpJ^> (9) 

The probability that one or more critical servers N„ fail, given that the software fails, is: 

Pc=i-0-p«) N “ (10) 

The probability that all non-critical servers N M (t) have failed by time t, given that the software fails, is: 

( 11 ) 


Equations (8) and (9) assume that client failures arc independent (i.e., one type of node failure does not cause another type of node 
failure). This is the case because a failure in one client's software would not cause a failure in another client's software. However it 
is possible that a failure in server software could cause a failure in client software, such as a client accessing a server that has corrupted 
data. Also, equations (10) and (1 1) assume that server failures are independent This is the case because a failure in one server's 
software would not cause a failure in another server's software. However it is possible that a failure in client software could cause a 
Mure in server software, such as a client with corrupted data accessing a server. No case of client failures that were caused by server 
failures nor of the converse have been found in the LOGAIS database. Of course, this does not mean that these events could not happen 
in general. To account for the possibility of these events, we would need to include the conditional probability of a client failure, given 
a server failure, and the converse. This model formulation is beyond the scope of this handbook and will be included in the next 
version of the model. 

Combining (8), (9), (10), and (1 1), the probability of a system failure by time t, given that a node fails, is: 

P>ode fefts=[PJ[P K (l)]+[PJ[PJl)]=[l-(l-pJ^][(pJ^+[l-(l- P J^[(pJ^] (12) 

and the probability of a node failure due to software is: 

Pn, = Pcc + P* 4 Pc. + P M (13) 


Time to Failure Prediction 


hi order to make Time to Failure predictions for each of the four types of node failures, the user first analyzes the defect data to 
determine what type of software defects could cause each of the four types of node failures; then the user partitions the defect data 
accordingly. More will be said about this process in th e Application of Model section. Next the user applies equation (14) of the 
Schneidewind Software Reliability Model [AIA93, KEL95, LYU 96, SCH92, SCH93] to make each of the four predictions, using 
ftie SMERFS software reliability tool [FAR93], In equation (14), T^t) is the predicted time (intervals) until the next F, failures (one 
or more) occur, a and p are failure rate parameters, s is the first interval where the observed failure data is used, t is the current 
interval, and is the cumulative number of failures observed in the range s,t 

(14) 

fcr (a/pMVF,) 

Time to Failure predictions are made for critical clients, non-critical clients, critical servers, and non-critical servers. As the 
predicted failure times are recorded, the user observes whether the condition for system failure, as defined previously, has been met 
If Ibis is die case, a predicted system failure is recorded. Thus, in addition to monitoring the types of predicted failures (e.g., critical 
client), die process also involves monitoring N K (t) and N„(t) to identity the time t when either is reduced to zero, signifying that the 
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supply of non-critical clients or non-critical servers has been exhausted In this situation, a failure of a critical client or critical server, 
respectively, will result in a system failure. Thus the riser predicts a system failure when the following expression is true (where "A" 
means "AND" and "V" means "OR"): 

((Predict critical client failure)A(N^(t)==0))V((Predict critical server failure)A(N M (t)=0)) (15). 

If the predictions produce multiple node failures in the same interval (e.g., critical client and critical server), the user records 
multiple failures for that interval. 


APPj JC ATIQ jy pF TmMomh 


Analysis of the Defect and Failure Data 

In this example the user applies the software reliability model to the Marine Corps LOGAIS system — a client-server logistical 
support system In this system it is important that the reliability specification distinguish between failure states Degraded-Type i, 
Degraded-Type 2, and System Failure , as previously defined (i.e., distinguish between node failures that cause performance 
degradation but allow the system to survive, and node failures that cause a system failure). This distinction is made when analyzing 
the system's defect data. The defect data used in the example are from the LOGAIS defect database, using the Defect Control System 
(DCS), a defect database management system which was used on the LOGAIS project [MHB96, MTP96]. 

In this Windows-based client-server system, the types of clients and servers that were previously defined are used, with 
corresponding types of defects and failures, as identified in the defect database [MHB96, MTP96]. The following short-hand notation 
for identifying the attributes of the defect database is used: 

o S: Software Defect 
o G: General Protection Fault (GPF) 
o N: Network Related Failure 
o C: System Crash 

The LOGAIS defect database is queried in order to identify the software defects that qualify as node failures. The following 
Boolean expressions, corresponding to the four types of node failures, are used: 

1. Critical Client Failure: COUNT as failures WHERE (SAGANAno/C). A GPF causes a node failure ( Degraded-Type 2) on a critical 
client, a client which must maintain communication with other nodes on the network (Network Mode), and the failure does not cause 
a System Crash (loss of server). 

2. Non-Critical Client Failure: COUNT as failures WHERE (SAGAnofNAwo/C). A GPF causes a node failure (Degraded-Type 1) 
on a non-critical client, a client which does not have to maintain communication with other nodes on the network (Standalone Mode), 
and the failure does not cause a System Crash (loss of server). 

3. Critical Server Failure: COUNT as failures WHERE (SAhoKjANAC). A System Crash causes a node failure (Degraded-Type 2) 
on a critical server, a server which must maintain communication with other nodes on the network (Network Mode), and the failure 
is not a GPF; it is more serious, resulting in the loss of a server. 

4. Non-Critical Server Failure: COUNT as failures WHERE (SAnoKjAno/NAC). A System Crash causes a node failure (Degraded- 
Type 1) on a non-critical servo:, a server which does not have to maintain communication with other nodes on the network, and the 
failure is not a GPF; it is more serious, resulting in the loss of a server. 

The above classification associates GPF with clients and System Crash with servers; it also associates Network Related Failures 
with critical node failures. Note that this is only an example. For other systems, different defect and failure classifications may be 
appropriate. 

The total failure count is obtained by taking the union of expressions 1 -4 as follows: 

5. Total Failure Count: COUNT as failures WHERE (SA((GAno/C)V(«o/GAC))). This expression is used to verify the correctness 
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of 1-4 because it should equal their sum. 
Observed R ange and P rediction Range 


The major objective of reliability modeling is to predict future reliability over the prediction range of test or operational time of 
a system. However to do so, there must be a historical record of defects and failures for computing the model parameters and for 
making the best fit with the historical data; the data is collected during the observed range of test or operational time of a system. The 
length of the observed range is determined by the amount of data that has been collected prior to making a prediction, while the length 
of the prediction range is determined by duration of the system's mission. The observed range in this example is 1,50 intervals and 
the prediction range is 51-61 intervals. These ranges are arbitrary and selected only to illustrate the process. We note that once a 
system has been tested or operated over the prediction range, there will be observed defects and failures in this range. The observed 
cbfects and failures in the prediction range are listed in Table 2. The failure counts corresponding to types 1-5, above, are summarized 
in Table 3, which shows the empirical probabilities of node failure that are computed using equations (3)~(7) and (13). For example, 
for critical clients, the computation is 24/4048-.005929. The user should verify the computations for the remaining types of nodes. 

Application Predictions 

Time to Failure 

Using equation (14 ) and failure data in die observed range 1 -50 (not shown) , we made predictions for Time to Failure, for t>50 days, 
for critical clients, non-critical clients, and non-critical savers, in Tables 4, 5 and 6, respectively. The predictions are made for a given 
numbers of failures ( time to one failure for t>50 days, time to two failures for t>50 days, etc.). The predictions are compared with 
die actual failure data, with the relative error and average relative error for cumulative values shown. In the case of critical servers, 
there are only two actual failures, both of which occur in the observed range. Only one prediction of Time to Failure for one more 
failure could be made at t=50 for critical servers because the predicted remaining failures at t=50 is 1 .40 ; therefore, critical server 
failures are not tabulated. In the case of non-critical nodes, the failure data is sufficiently dense to allow a failure count interval of one 
day. hi the case of critical clients, the failure data was sparse; thus a five day interval was used for prediction, with these predictions 
converted to the one day intervals shown in Table 4. We note that predictions are difficult to make with this type of data because the 
defects and failures are not recorded in CPU execution time. Rather they are recorded in calendar time in batches, as shown in the 
Table 2, based on administrative convenience. Many of these batches are submitted at the end of a workday. This time becomes the 
"submit date". 

Using die data in Tables 4-6, we merge and sequence file various types of failure predictions in Table 7. The purpose of this table 
is to construct the scenario of failures and surviving non-critical nodes so that the time of System Failure can be predicted. The table 
shows that seven node failures (i.e., the sequence NS, NC, NC, CC, NC, NC, CC) are predicted to occur before the system is predicted 
to fail. This occurs at t=6 1 .07 days when there are no non-critical clients available and a critical client fails. No critical server failures 
are shown in this table because file prediction of Time to Failure of 99.35 days cumulative is beyond the prediction range of interest 
in this example. 

Using the data in Tables 4-6, we merge and sequence the various types of actual failures in Table 8. Similar to Table 7, the 
purpose of this table is to construct the scenario of actual failures and surviving non-critical nodes so that the actual time of System 
Failure can be determined and compared with the predicted values. As in the case of the predictions, this table shows that seven node 
failures (i.e., the sequence NC, NS, NC, NC, NC, NC, CC) occur before the system fails. This occurs at t=61days when there are no 
non-critical clients available and a critical client fails. No critical server failures are shown in this table because they occurred prim* 
to the range of this example. 

Probability of System Failure 

Lastly, using equation (12), we predict the probability of system failure, given a node failure, in column 5 of Table 9, as fire system 
progresses through the predicted failure scenario that was shown in Table 7. Except for row 2 in Table 9, the actual probability is the 
same as the predicted probability because the actual failure scenario that was shown in Table 8 produces the same numbers of non- 
critical clients and servers that are shown in col umns 6 and 7, respectively. Because file predicted and actual failure scenarios are 
identical, accept for row 2, the predicted time to failure and type of node failure, columns 1 and 2, respectively, can be compared in 
with the corresponding actual values in columns 3 and 4, for given probabilities of system failure. These values were reproduced from 
Tables 7 and 8, respectively. Because for a given F^Jnode fails, the cumulative time to failure occurs later for the predicted values. 
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the model is a bit optimistic with respect to reality for this example. Note that the in the last row of Table 9 the system has not yet 
failed. This occurs when a critical client fails at Day 61 .07 predicted (see Table 7) and at Day 61 actual (see Table 8). At this time 
there are no non-critical clients left to replace the failed critical client 

The significant results that emerge firm this analysis are that: 1) The P^node fails is only significant (.029790) when the supply 
of both non-critical clients and non-critical servers has been exhausted and 2) P^/node fails is significantly lower than the probability 
of any type of node failure caused by a software defect: p w = 065705, obtained from equation (1 3) and computed in Table 3. Thus 
evaluations of system reliability should recognize that software failures are not necessarily equivalent to system failures and that 
assessments of software reliability that treat every failure as equivalent to a system failure will grossly understate system reliability. 

CONCLUSIONS AND FUTURE RESEARCH 

Based on the above approach, it appears feasible to develop a system software reliability model for a client-server system. In 
order to implement the approach, it is necessary to partition the defects and failures into classes that are then associated with critical 
and non-critical clients and servers. Once this is done, predictions are made of Time to Failure for each type; the predictions are 
classified according to those that would result in a node failure caused by a software defect and those that would result in a system 
Mure caused by a series of software defects. Then the probability of system failure is computed. A significant result of the research 
is that software failures should not be treated as the equivalent of system failures because to do so would grossly understate system 
reliability. 

In future research we will deal with the problem of how to apply the model to a system that has a large number of nodes. The 
technique that we described for monitoring the times when predicted node and system failures occur would be cumbersome for a large 
system. It appears that a program must be written to automate this process. Other possible future research activities include the 
following: ©tend the model to include hardware failures; develop measures of performance degradation, as nodes fail; include a node 
repair rate to reflect the possibility of recovering failed nodes during the operation of the system; apply smoothing techniques, such 
as the moving average, to mitigate anomalies in calendar time defect data. 
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Failure States 



Degraded - Type 1 

Degraded - Type 2 

System Failure 

Non-Critical Client 

Node Failure 

Does Not Apply 

Does Not Apply 

Critical Client 

Does Not Apply 

Node Failures) and 
N«(t)X> 

Node Failure(s) and N^t)^ 

Non-Critical Server 

Node Failure 

Does Not Apply 

Does Not Apply 

Critical Server 


Node Failings) and NJt)X) 

Node Failures) and 
NJt)=0 


Table 2 

dm>SM>k>fka! Node Failure Count Database (Sample) 

CC: Critical Ota 4 Node Failure 
NC: Non-Critic^ CHent Node Failure 
CS: Critical Server Node Failure 
NS; Non-Critical Server Node Failure 



Defect ID 

Number 

Submit Bate 

H 

NC 

H 

B 

51 

2633,2634 

2 

1/24/95 


X 



51 

2635,2636,2637,2638 

4 

1/24/95 




X 

52 








53 

2661,2662,2663,2664 

4 

1/26/95 


X 



54 

2641,2644,2645,2669, 

2671,2672,2673,3003 

8 

1/27/95 

■ 

X 



54 

2640,2643,2670,2674, 

2675,2676,2783 

7 

1/27/95 




n 

55 

2450 

I 

1/30/95 


X 


1 

56 








57 ! 








58 

2487 

1 

2/2/95 




X 

59 

2511,2512,2513 

3 

2/3/95 


X 



60 








61 

3025,3026,3027,3029 

4 

2/7/95 

X 





Table 3 

Summary of Node Failures (4048 Software Defects) 





1. Critical Client 

24 

Pec =005929 

2.Non-CriticaS Client 

83 

Pac =. 020250 

3. Critical Server 

2 

Pa =000494 

4. Non-Critical Server 

158 

Pas -.0 39032 

5. Total 

267 

p^.065705 
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Table 4 

CiMc^ €1^ at Tisae 8 ®^© 


Observed Ranse-1.50 Bays; Failure Connt-lt; Prediction Ranfe>50 Bays 


Predicted | 

Aetna! 




Given Number 
ofFailures 

EB 


1 


Relative Error 
fi^ereeuft 

1 

5.19 

55.19 

ii 

61 

-9.52 

2 

11.07 

61.07 

u 

61 

+.11 

3 

17.88 

67.88 

ii 

61 

+11.28 

4 

25.95 

75.95 

a 

61 

+24.51 

5 

35.86 

85.86 

36 

86 

-.16 


Averaged. 12% 


Tables 

Non-Critkal CM Predictions Made at Time c '50 Bays 
Observed Rangg°l,§0 Bays; Failore Coant°36; Prediction RangoSO Bays 


Predicted 

■■■■1 

Actual 


1 

Given Number 

of Failure*; 

B9 

Cumulative Time 

SI 

Cumulative Time 
to Failure J|j)avs'| 

Bsa 

1 

2.41 

52.41 

1 

51 

+2.76 

2 

4.87 

54.87 I 

1 

51 

+7.59 

3 

7.37 

5737 

3 

53 

+8.25 

4 

9.92 

59.92 

3 

53 

+13.06 j 

5 

12.52 

62.52 

'"T” — 

53 

+17.96 


Average=9.92% 


Tabled 

Non-Critk&l Server Predictions Made at Time»SO Bays 
Observed Rangc=l,50 Bara FaSare Coant=108; Prediction RangoSO Bays 


Predicted 


Actual 



Given Number of 
Failures 

ESS 

Cumulative Time 
to Failure CHavs! 


Cumulative Time 
to Failure fDavsl 

Relative Error 

1 

1.96 

51.96 

i 

51 

+1.88 

2 

3.93 

53.93 

i 

51 

+5.75 

3 

5.90 

55.90 

i 

51 

+9.61 

4 

7.87 

57.87 

i 

51 

+13.47 

5 

9.84 

59.84 

4 

54 

+10.81 
a A 


Average=8.30% 
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Table 7 

Predicted Time to Failure When Failures are Merged and Sequenced. Observed Range=l,5G Days; Prediction Range**51,61 Days 
CC: Critical Client NC: Non-Critical Client NS: Non-Critical Server 


Cumulative Time to Failure Time to Failure Type of Failure Number of Non-Critical Number of Non-Critical 

(Days) (Days) Clients Servers 








Table 8 

Actual Time to Failure When Failures are Merged and Sequenced. Range°=51,61 Days 

CC: Critical Client NC: Non-Critical Client NS: Non-Critical Server 


Cumulative Time 
to Failure (Days) 


Time to Failure Type of Failure 

(Days) 





Table 9 

Probability of System Failure 


Predicted Cumulative 
Time 

to Failure (Days) 

Predicted 
Type of 
Node 
Failure 

Actual Cumulative 
Time to Failure 
(Days) 

Actual 
Type of 
Node 
Failure 

Probability of 
System Failure 
Given a Node 
Failure 

Number of Non- 
Critical Clients 
Available 

Number of Non- 
Critical Servers 
Available 

50 


50 


0.000019 

5 

1 

51.96 

NS 



0.000494* 

5* 

0* 

52.41 

NC 

51 

NC,NS 

0.000494 

4 

0 

54.87 

NC 

51 

NC 

0.000494 

3 

0 

55.19 

CC 

53 

NC 

0.000506 

2 

0 

57.37 

NC 

53 

NC 

0.001087 

1 

0 

59.92 

NC 

53 

NC 

0.029790 

0 

0 


* Applies only to predicted values. 
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CLIENT-SERVER SOFTWARE RELIABILITY PREDICTION 


Too often the assumption is made, when doing software reliability modeling and 
prediction, that the software involves either a single module or node. The reality in 
today's increasing use of multi node client-server and distributed systems is that there 
are multiple entities of software that execute on multiple nodes that must be modeled 
in a system context, if realistic reliability predictions and assessments are to be made. 


Criticality and Redundance 


o If there are N c clients and N s servers in a client-server system, it is not necessarily 
the case that a software failure in any of the N c clients or N s servers will cause the 
system to fail. If such a system were to be modeled as a single entity, the predicted 
reliability would be much lower than the true reliability because the prediction would 
not account for criticality and redundancy. 

o The first factor accounts for the possibility that the survivability of some clients and 
servers will be more critical to continued system operation than others, 
o The second factor accounts for the possibility of using redundant nodes to allow for 
system recovery should a critical node fail. 

o Identify which nodes — clients and servers — are critical and which are not critical. 
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DEFINITIONS 


Critical function: An application function that must operate for the duration of the 
mission, in accordance with its requirement, in order for the system to achieve its 
mission goal (e.g., the requirement states that a military field unit must be able to send 
messages to headquarters and receive messages from headquarters during the entire 
time that a military operation is being planned). 

Usually this type of function operates in the network mode, which means that the 
application requires more than a single client to perform its function; thus client to 
server or client to client communication is required. 

Non-critical function: An application function that does not have to operate for the 
duration of the mission in order for the system to achieve its mission goal (e.g., it is 
not necessary to perform word processing during the entire time that a military 
operation is being planned). 

Often this type of function operates in the standalone mode, which means that a single 
client performs the application function; thus client to server or client to client 
communication is not required, except for the possible initial downloading of a 
[program from a file server or the printing of a job at a print server. 
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DEFINITIONS (Continued) 

i 

Critical clients and servers: Nodes with critical functions, as defined above. These 
nodes must be kept operational for the system to survive, either by incurring no 
failures or by reconfiguring non-critical nodes to operate as critical nodes. 
Non-critical clients and servers: Nodes with non-critical functions, as defined 

i 

above. These nodes also act as backups for the critical nodes, should die critical nodes 
fail. 

: 

Software Defect: Any undesirable deviation in the operation of the software from its 
intended operation, as stated in the software requirements. 

Software Failure: A defect in the software that causes a node (either a client or a 
server) in a client-server system to be unable to perform its required function within 
specified performance requirements (i.e., a node failure). 

System Failure: The state of a client-server system, which has experienced one or 
more node failures, wherein there are insufficient numbers and types of nodes 
available for the system to perform its required functions within specified performance 
requirements. 
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MODEL FORMULATION 

System Nodes 

N^: Number of Critical Client nodes. 

N^t): Number of Non-Critical Client nodes. 

N cs : Number of Critical Server nodes. 

N^t): Number of Non-Critical Server nodes. 

where the total number of nodes N=N cc +N nc (t)+N cs +N ns (t) . 

As long as the system survives, N^. and N cs are constants because a failure of a 
critical node will result in a non-critical node replacing it, if there is a non-critical 
node available. A change in software configuration may be necessary on the former 
non-critical node in order to run the failed critical node’s software. 

If a critical node foils, the system foils, if there are no non-critical nodes available 
on which to run the failed critical node's software. 

In contrast, N^t) and N^t) are decreasing functions of operating time because 
these nodes replace failed critical nodes, and are not themselves replaced, where 
N nc (0) is the number of non-critical clients and N^O) is the number of non-critical 
servers at the start of system operation, respectively. 

In addition, if a non-critical node fails, the function that had been operational on 
the failed node can be continued on another node of this type and the system can 
continue to operate in a degraded state. 

When either a non-critical node replaces a critical node or a non-critical node fails. 
Kit) or N-it) is decreased by one, as appropriate. 
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Node Failure Probabilities 

p^.: probability of a software defect causing a critical client node to fail, 
p*.: probability of a software defect causing a non-critical client node to fail. 
p cs : probability of a software defect causing a critical server node to fail, 
p^: probability of a software defect causing a non-critical server node to fail. 

Thus given a node failure, we have the following function for the probability of 
system failure: 

Psy/node fails=f(N cc , p^, N K , p^, N M , p cs , N^, p J 

Estimating Node Failure Probabilities 

The four probabilities are estimated from data in a defect database as follows: 
Pcc = Zifcc(iyD> where f^i) is the critical client node failure count in interval i; 
Pnc = ZiC(i)/D, where fyj(i) is the non-critical client node failure count in interval i; 
Pcs=E&(i)/D> where f cs (i) is the critical server node failure count in interval i; 
Pns = Z£is(i)/D> where fy.(i) is the non-critical server node failure count in interval i; 
and the total defect count across all intervals is D=£;d(i), where i is the identification 
of an interval of operating time of the software and d(i) is the total defect count in 
interval i. 

In a specific application. Boolean expressions are used to search the defect 
database and extract the failure counts (e.g., fy(i)) that are used to compute the above 
equations. These expressions specify the conditions that qualify a defect as a node 
failure (e.g., defect that is a General Protection Fault that affects network operations 
on a Windows-based system). 
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Failure States 

At a given time t, the system can be in one of three failure states that pertains to 
the survivability of the system, as follows, in decreasing order of capability: 
Degraded - Type 1: A software defect in a non-critical node causes the node to fail. 
As a result, the system operates in a degraded state, with one less non-critical node. 
No reconfiguration is necessary because the failed node is not replaced. 

Degraded - Type 2: A software defect in a critical node causes the node to fail. As 
a result, the system operates in a degraded state, but one that is more severe than 
Type I, because there would be both a temporary loss of one critical node during 
reconfiguration and a permanent loss of one non-critical node (i.e., one of the non- 
critical nodes takes over the function of the failed critical node). Under certain 

this type of node failure can cause a system fi 

* *1 
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The failure states are summarized in Table 1. 

Table 1 
Failure States 



Degraded - Type 1 

Degraded - Type 2 

System Failure 

Non-Critical Client 

Node Failure 

Does Not Apply 

Does Not Apply 

Critical Client 

Does Not Apply 

Node Failure and 

N»(t)X) 

Node Failure and 
N„(t)=0 

Non-Critical 

Server 

Node Failure 

Does Not Apply 

Does Not Apply 

Critical Server 

Does Not Apply 

Node Failure and 
N_(t)>0 

Node Failure and 
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System Failure Probability 

The probability that one or more critical clients fail, given that the software 
fails, is: 

p^i-a-Pccf" 


The probability that all non-critical clients N^t) have failed by time t, given that 
the software fails, is: 


i p «(tXpJ N,,c<,, 


The probability that one or more critical servers N cs fail, given that the software 
fails, is: 

PCs=1-(1-PCs) NCS 

The probability that all non-critical servers N m (t) have failed by time t, given that 
the software fails, is: 

p «(tMpJ N “® 

Combining the above four equations, the probability of a system failure by time t, 
given that a node fails, is: 

P^node fails=[PJ[P DC (t)]+[ PC J[P ns (t)]= 

tHl-pJ^[(pJ N “ <,> ] + [l-(l-pa) N “][(P») N “ < ' ) ] 

Probability of Client Failure Probability of Server Failure 
Probability of a Node Failure Due to Software 


PswTcc+Pnc+Pcs+Pns 
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Figure 1. Surviving Configuration 


Model Concepts 

o The model concepts are 
illustrated in Figures 1 and 2, 
where there are five critical 
clients, five non-critical clients, 
one critical server, and one non- 
critical server. 

o Figure 1 shows a surviving 
configuration, where a critical 
client fails and a critical server 
fails but there are non-critical 
clients and a non-critical server 
to take over the functions of the 
failed nodes. 

- The consequence of this 
configuration is a Degraded - 
Type 2 failure mode. 
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CS-CrU icsi 
Server 



Figure 2. Failing Configuration 


o Figure 2 shows a failing 
configuration where there are 
no non-critical clients and 
server to take over for the 
failed nodes. 

- The consequence of this 
configuration is a system 

failure. 
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Time to Failure Prediction 

In order to make Time to Failure predictions for each of the four types of node 
failures, we first analyze the defect data to determine what type of software defects 
could cause each of the four types of node failures; then we partition the defect data 
accordingly. Next we apply the time to Mure equation of the Schneidewind Software 
Reliability Model to make each of the four predictions, using the SMERFS software 
reliability tool. 

In the equation, T/t) is the predicted time (intervals) until the next F t failures (one 
or more) occur, a and p are failure rate parameters, s is die first interval where the 
observed failure data is used, t is the current interval, and is the cumulative 
number of failures observed in the range s,t. 

T F (tH(log [a/(a-p(X M +F t )])/p]-(t-s+l> 
for (a/p)>(X^F t ) 
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Time to Failure Prediction (Continued) 

Time to Failure predictions are made for critical clients, non-critical clients, 
critical servers, and non-critical servers. 

As the predicted failure times are recorded, we observe whether the condition for 
system failure has been met. If this is the case, a predicted system failure is recorded. 
Thus, in addition to monitoring the types of predicted failures (e.g., critical client), the 
process also involves monitoring N^t) and N M (t) to identify the time t when either is 
reduced to zero, signifying that the supply of non-critical clients or non-critical servers 
has been exhausted. In this situation, a failure of a critical client or critical server, 
respectively, will result in a system failure. 

Thus we predict a system failure when the following expression is true: 

((Predict critical client failure)A(N nc (t)=0))V((Predict critical server 
failure)A(N ns (t)=0)). 

If our predictions produce multiple node failures in the same interval (e.g., critical 
client and critical server), we record multiple failures for that interval. 
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Node Failure Counts 

o The LOGAIS defect database was queried in order to identify the software defects 
that qualify as node failures. The following Boolean expressions, corresponding to the 
four types of node failures, were used: 

1. Critical Ghent Failure: COUNT as Mures WHERE (SAGANAnofC). A GPF 
causes a node failure {Degraded-Type 2) on a critical client, a client which must 
maintain communication with other nodes on the network (Network Mode), and the 
failure does not cause a System Crash (loss of server). 

2. Non-Critical Client Failure: COUNT as Mures WHERE (SAGAnofNAnotC). A 
GPF causes a node failure (Degraded-Type 1) on a non-critical client, a client which 
does not have to maintain communication with other nodes on the network 
(Standalone Mode), and the failure does not cause a System Crash (loss of server). 
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Node Failure Counts f Continued! 

3. Critical Server Failure: COUNT as failures WHERE (SAwo/GANAC). A System 
Crash causes a node failure ( Degraded-Type 2) on a critical server, a server which 
must maintain communication with other nodes on the network (Network Mode), and 
the failure is not a GPF; it is more serious, resulting in the loss of a server. 

4. Non-Critical Server Failure: COUNT as failures WHERE (SAnotGAnotNAC). A 
System Crash causes a node failure (Degraded-Type 1) on a non-critical server, a 
server which does not have to maintain communication with other nodes on the 
network, and the failure is not a GPF; it is more serious, resulting in the loss of a 
server. 

o The above classification associates GPF with clients and System Crashes with 
servers; it also associates Network Related Failure s with critical node failures. 
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Defect Database 

An example of the defect database is shown in Table 2, where Interval identifies 
the period for counting defects (daily in this case). Defect ID is the identification 
assigned the defect. Number is the count of defects in die interval. Submit is the date 
the defect was submitted to the defect database, and the last four col umns indicate 
whether the defects resulted in one of die four types of failure. 

Upon querying the defect database, using Boolean expressions 1-4, we find the 
failure counts listed in the sample database in Table 2. The failure counts 
corresponding to types 1, 2, 3, and 4 above are summarized in Table 3, which shows 
the empirical probabilities of node failure. 
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Table 2 


Defect Database (Sample) 


CC: Critical Client Node Failure NC: Non-Critical Client Node Failure 

CS: Critical Server Node Failure NS: Non-Critical Server Node Failure 


Interval 

Defect ID 

Number 

Submit 

CC 

NC 

CS 

NS 

51 

2633,2634 

2 

1/24/95 


X 



51 

2635,2636,2637,2638 

4 

1/24/95 




X 

52 








53 

266 1 ,2662,2663,2664 

4 

1/26/95 


X 



54 

264 1 ,2644,2645,2669, 
267 1 ,2672,2673,3003 

8 

1/27/95 


X 



54 

2640,2643,2670,2674, 

2675,2676,2783 

7 

1/27/95 




X 

55 

2450 

1 

1/30/95 


X 



56 








57 








58 

2487 

1 

2/2/95 




X 

59 

2511,2512,2513 

3 

2/3/95 


X 



60 








61 

3025,3026,3027,3029 

4 

2/7/95 

X 





Table 3 

Summary of Node Failures (4048 Software Defects) 



Number of Failures 

Probability 

1. Critical Client 

24 

p^.005929 

2.Non-Critical Client 

83 

p^.020250 

3. Critical Server 

2 

p cs =.000494 

4. Non-Critical Server 

158 

Pns = 039032 

d. Total 

267 

p m =.065705 
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csnsajmm 

o Based on the above approach, it is feasible to develop a system software reliability model 
for a client-server system. 

o In order to implement the approach, it is necessary to partition the defects and failures into 
classes that are then associated with critical and non-critical clients and servers. 

o Once this is done, predictions are made of Time to Failure for each class; the predictions 
are classified according to those that would result in a software failure and those that would 
result in a system failure; and die probability of system failure is computed. 

o It is important that software failures not be treated as the equivalent of system failures 
because to do so would grossly understate system reliability. 

o Possible model enhancements include the following: extend the model to include hardware 
failures; develop measures of performance degradation, as nodes fail; include a node repair 
rate to reflect the possibility of recovering failed nodes during the operation of the system. 
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Operational Test Readiness Assessment of an 
Air Force Software System: A Case Study S Co 0 

Amrit L. Goel, Syracuse University 
Capt. Brian Hermann, AFOTEC, NM 
Major Randy McCanne, Scott AFB, IL 

This paper describes a new methodology that was developed to assess operational test 
readiness of an Air Force software system under development over a period of several years. 
The evaluation is primarily based on an analysis of the open and closed problem reports. Other 
factors such as test completeness and requirements stability are also considered, but mostly in 
an implicit way. The methodology is objective, has a sound mathematical foundation and can 
be employed for evaluation of any large software system. 

AFOTEC Software Maturity Evaluation Guide provides details of the data needs and 
assessment approach to be used for Air Force Systems. The key criterion is to determine whether 
the unresolved severity 1 and 2 failures can be resolved prior the scheduled OT&E (Operational 
Test and Evaluation) start date. The approach taken in the AFOTEC Guide is to estimate the 
time required for resolution based on the current unresolved failures and an average closure rate. 
This methodology extends and builds upon the current AFOTEC approach by (i) considering the 
as yet undetected faults in the system, and (ii) using two different estimated fault closure rates. 

An equivalent problem in commercial applications is to determine readiness for beta test, 
readiness for release, or readiness for first customer ship. Several studies over the past twenty 
years have attempted to address this problem for both defense and commercial systems. Most 
of these have proposed using a decision rule in conjunction with some software failure model to 
predict software readiness. Others have proposed approaches based on minimizing a predefined 
cost function. 
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The new methodology employs statistical trend tests and software reliability models for 
assessing readiness for dedicated OT&E. It explicitly incorporates the use of these techniques 
in the decision making process by employing an iterative three step procedure: 

Step 1. Perform statistical trend analysis 

Step 2. Select software reliability model that best fits the system failure data 
Step 3. Conduct a readiness assessment using the results of Steps 1 and 2 

The basic idea behind the proposed methodology is to determine objective readiness 
information from the available data on problem reports and their closures. In addition to studying 
the plots of cumulative open and closed problems and average time to close, it uses a statistical 
assessment of the trends in the failures and closed fault curves. Each of the steps has a solid 
mathematical foundation. 

In particular, it uses the Laplace trend statistic for determining whether the software 
failure rate is steady, improving, or deteriorating. When an improving trend is indicated, the 
failure process is modeled by an appropriate software reliability model. 

Information from the trend plots is used to guide model selection as well as to obtain 
initial model parameter estimates. Next, the reliability model is used to estimate the future 
failure detection pattern. An initial assessment of OT readiness is then made by accounting for 
the number of failures remaining open, the problem closure rate and the expected new failures 
likely to be detected. This is done using the reliability model selected, the actual number of open 
problems and a stochastic model fitted to the problem closure curve. Such assessments are made 
for four different cases. Information from the Laplace trend plots and other factors such as rate 
of testing can also be used to decide whether earlier data should or should not be considered for 


SEW Proceedings 


262 


SEL-96-002 



modeling and readiness assessment. 


The presentation will address the following topics: 

• Proposed methodology 

• Laplace trend test and its relationship to software reliability models 

• Description of development data from an Air Force system 

• Analyses of open and closed problems, and readiness assessment 

• Limitations and benefits of the methodology. 

The methodology described here is currently being used on other commercial and defense 
systems. This paper will provide an assessment of the experience gained and problems 
encountered in these applications. Suggestions for improvements and any progress on them will 
also be discussed. 

Keywords/Phrases : 

® Software Change Trends 

• Software Defect Density 

• Software Immaturity Case Study and Lessons Learned 

• Software Maturity 

• Software Operational Testing 

• Software Problem Trends 

• Software Test Readiness 
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1. Introduction 

Prior to purchasing space, aircraft, or communications systems, the Air Force 
operationally tests them to ensure they meet the specified needs of their users. The Air 
Force Operational Test and Evaluation Center (AFOTEC) conducts these operational 
tests for the Air Force. Since many modem systems rely heavily on software, the Air 
Force requires software to be mature before beginning these lengthy, expensive tests. 

Software maturity is a measure of the software’s progress toward meeting documented 
user requirements. The software analysis division at AFOTEC uses software problem, 
change, and failure tracking data to help demonstrate when software has sufficiently met 
requirements and fixed identified problems. The concept and evaluation are simple, but 
rarely considered by developers and acquirers prior to AFOTEC involvement. 

AFOTEC evaluates software maturity with three distinct goals: 


1. Test Readiness 

Reduce tax dollars wasted on testing immature systems. 

2. Readiness for 
Fielding 

Determine how far the software has progressed toward 
satisfying user needs. 

3. Identify Software 
Maturity Drivers 

Identify portions of the software system which currently 
generate the most changes and may, therefore, be expected to 
generate the largest' future maintenance effort. Where possible 
this information should be used to improve the software prior to 
fielding. 


Table 1: Software Maturity Evaluation Goals 


^Evaluation Background 

2.1 Software Maturity Data and Collection 

The evaluation begins with the software maturity database. Many programs use different 
names, but the required data is almost always collected by development organizations. 
Collection and analysis of the data typically begin when the software is placed under 
formal configuration control and continues through fielding of the software. The 
minimum data required to evaluate software maturity is shown in Table 2. 


1. Software Change (Problem) Number 

2. Description 

3. Computer Software Configuration Item (CSCI) Identifier 

4. Severity Level 

5. Date Change Opened (or problem found) 

6. Date Change (Problem) Closed and Implemented 

Table 2: Software Maturity Data 


SEW Proceedings 


266 


SEL-96-002 








During the development and initial testing, developers and acquirers work together to 
assign a severity level rating to each problem. Later during operational testing, AFOTEC 
is responsible for scoring of software problems. The Air Force uses a standard five-point 
scale shown in Table 3. 

2.2 Severity Level Categorization 

Current Air Force policy requires that no system can progress to the operational testing 
phase with open severity level one or two software problems. According to these 
definitions, severity level one and two problems imply the system does not meet user 
needs and therefore operational testing would be a waste of time and money. 


Severity 

Level 

Description 

1 

Mission failure or jeopardizes safety. 

2 

Mission degraded with no possible work-around. 

3 

Mission degraded but a work-around solution is 
known. 

4 

Operator inconvenience or annoyance 

5 

Any other change. 


Table 3: Software Problem Severity Levels (MEL Standard 498) 


2.3 Weighting of Severity Levels 

To help estimate the operational impact of each change, we assign a weight to each 
severity level (Table 4). The description of the trend charts will show how these 
weightings can help to distinguish between many insignificant problems and many 
important problems. 


Severity 

Level 

Weight 

(Change Points) 

1 

30 

2 

15 

3 

5 

4 

2 

5 

1 


Table 4: Weighting Factors 


2.4 Maturity Evaluation and Analysis Tool 

AFOTEC developed a Microsoft® Excel for Windows™ based tool, called Maturity 
Evaluation and Analysis Tool (MEAT), to automate the data manipulation, produce trend 
charts, and speed analysis and reporting. The tool and user’s manual are available at no 
cost from HQ AFOTEC/SAS. 
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2.5 Evaluation Indenture Level 

While software maturity can be evaluated at the system software level, it is also beneficial 
to look at maturity from lower indenture levels. Selecting the appropriate evaluation 
indenture level is based on software size, number of changes, and the length of time the 
software change data is collected. As a general rule, we suggest the software maturity 
should be evaluated to at least the CSCI level. For some large programs, it will be 
possible and beneficial to delve deeper to the computer software component (CSC) 
indenture level. In either case, the results help to determine which components or 
configuration items are causing maturity problems. This specific information helps the 
acquiring organization and the developer more effectively address problems. 

2.6 Synthesis of Many Trends 

Software maturity is not a single trend or evaluation. It is a synthesis of many trends that 
must be considered together with the external factors that influence them. 



Figure 1: Software Maturity - A Synthesis of Many Trends 
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2.7 External Factors 


2.7.1 Test Rate 

One of the external factors that can affect software maturity is developmental test 
schedule. This aspect can be seen in both test rate and test completeness. An 
understanding of test rate helps the evaluator determine if software appears mature only 
because testing has slowed, or explain an unusually high change origination rate resulting 
from an aggressive test schedule. The test rate should, in fact, affect the slope of the total 
originated changes curve. A sample test rate chart is shown in Figure 2. 



2.7.2 Test Completeness 

Another way program schedule can affect software maturity is through test completeness. 
This measure enables the evaluator to estimate confidence in the software maturity 
evaluation. A high percentage of successfully completed test procedures, with respect to 
the total number of test procedures, indicates testing has identified a correspondingly high 
percentage of problems. One drawback to this measure is that traceability between test 
procedures and requirements or functions is not part of test completeness, but it is 
necessary to verify the thoroughness of testing. Figure 3 shows an example of test 
completeness. Notice the total number of test procedures typically increases during the 
development and testing. 
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2.7.3 Requirements Stability 

Another factor which influences software maturity trends is requirements stability. 
Software requirements continue to grow and change in nearly every development. New 
or modified requirements will likely drive software changes and increase the slope of the 
total originated changes curve. Knowing the cause for software changes can help to 
pinpoint solutions. 
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3. Trend Charts 


3.1 Weighted and Unweighted Software Changes 

This basic maturity chart (Figure 4) shows the total changes originated , closed, and 
remaining trends. This chart is also a good example the ideal shape of each trend line. 
To indicate maturity or progress toward maturity, the total changes originated trend 
should begin to level off. This indicates testing is finding problems at a lower rate than 
earlier in the development and testing. The total changes closed curve should closely 
follow the identified changes. Ideally, all identified changes would be closed and the 
remaining changes curve would show no backlog. This chart is also presented in an 
unweighted form as well as individually for each severity level. 


Accumulated Software Changes (weighted) as of 25 Dec 95 



T old Originated 
Told Dosed 
-^♦—Remaning 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

Period Number Quarterly 


Figure 4: Accumulated Software Changes (Weighted) 


3.2 Remaining Problems 

Although the remaining changes trend in an unweighted chart shows the current software 
problem/change backlog. Figure 5 presents a more useful view. This stacked bar chart 
shows the overall backlog trend as well as each severity level’s contribution to the total 
backlog. 
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Remaining Software Probelms (Unweighted) as of 25 Dec 95 



Figure 5: Remaining Software Problems 


3.3 Average Severity Level 

In the next chart (Figure 6), we present the average severity level of all originated, closed, 
and remaining changes. Ideally, the average severity level of problems should drop over 
time. Another good sign is if remaining changes are of a lower average severity level 
than those changes already closed. This indicates that the developer is doing a good job 
prioritizing his efforts. 



Figure 6: Average Severity Level 


3.4 Distribution of Changes by Severity Level 

Although Figure 7 is not actually a trend, it shows how the changes are distributed by 
severity level. The sample chart exaggerates the expectation that most changes will be of 
lower severity level. 
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Number of Changes by Severity as of 31 Oct 94 
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Figure 7: Distribution of Changes by Severity Level 


3.5 Average Closure Time by Severity Level 

Figure 8 shows the average length oftime required to close problems and change requests 
of each severity level and the average length of time that remaining changes have been 
open. Understanding this information and the process used to implement changes helps 
to estimate much change traffic to expect, how many software maintainers will be 
required, and how far away the software is from being ready for release. 



Figure 8: Average Closure Time 


3.6 Total Changes and Change Density 

The total number of changes for each CSCI helps to identify software maturity problem 
areas. In addition to sheer numbers of changes, normalizing changes by the size (new or 
modified lines of code) for each CSCI shows which parts of the code have the most 
change requests and are most likely to require future effort. We call this normalized 
measure, change density. 
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Change Density 



Computer Software Configuration Item 


I Total 
Changes 


—•—Changes/ 

KLOC 


Figure 9: Total Changes and Change Density 


From the bars in Figure 9, we identify CSCIs #8, #17, #7, and #13 as portions of the 
software which have produced large numbers of changes. The lines on the same chart 
identify CSCIs #8, #12, #17, #10, and #7 as components which produce large numbers of 
changes per line of code. The union of these two sets can be thought of as maturity 
drivers for the software system 
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3.7 Remaining Changes and Defect Density 

The final trend chart is a relatively new addition to our evaluation methodology Figure 10 
shows both remaining changes for each CSCI and the number of remaining changes 
(problems) divided by thousands of new or modified source lines of code (defect density). 
Michael Foody suggests software is not ready for release until the defect density is below 
0.5 *. Finding portions of software with the most remaining problems and the highest 
defect densities are two additional pieces to the maturity puzzle 



Figure 10: Remaining Changes and Defect Density 


The bars in Figure 10 identify CSCIs # 8, #17, #7, and #13 as components with large 
numbers of remaining changes. The CSCIs with a defect density above the 0.5 threshold 
(CSCIs #8, #12, #15, #18, #17, #7, and #2) are not ready for operational testing or 
release. The union of these two sets are the software components which are currently 
driving software immaturity. 


1 Michael A. Foody, “When is Software Ready For Release?” UNIX Review March 1995 
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4. Case Study 

This section is a time-phased example of software maturity evaluation for a major Air 
Force acquisition program. The program was selected because it’s initial immaturity 
presents a convincing case for delaying operational use of software until maturity. The 
program name and developer will not be identified. 

4.1 Initial Evaluation - March 1995 

The initial evaluation of the software maturity was analyzed and briefed in March of 
1995. Although the data was available to, and in fact from, the development 
organization, software maturity was not evaluated except to track open software change 
requests. The acquiring organization understood there were problems in the 
development, but had not evidence of how severe the problems were or where the 
problems were located. 

4.1.1 Weighted and Unweighted Software Changes 

Like most software developments, problems were initially found much more quickly than 
they were being fixed. Unfortunately, this trend continued up to the point of our initial 
evaluation. As shown in Figure 1 1, the total originated and total closed trends diverge 
except for a push to close changes from week 16 through 20. The result is an increasing 
backlog of software changes. At that time, we had no reason to expect a slowdown in 
change origination and current closure rates do not predict improvement. 



Figure 11: Initial Evaluation Weighted Changes 

The unweighted version of this chart (Figure 12) looks almost identical. The only 
difference is that the numbers in this chart represent actual changes and backlog size 
rather than the change points used in the weighted chart. 
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Accumulated Software Changes (unweighted) as of 06 Mar 95 



Figure 12: Initial Evaluation - Unweighted Changes 


4.1.2 Average Severity Level 

The average weight of changes (problems) throughout the period up to the initial 
evaluation was between six and eight (Figure 13). This equates to between a severity 
level two or three change. The only positive trend shown by this chart is that the 
developer has recently been working on the most severe problems. 



Figure 13: Initial Evaluation - Average Severity Level 


4.1.3 Distribution of Changes by Severity Level 

The distribution of changes across severity levels showed two surprising results. First, an 
unusually large number of severity level one changes were opened and remained open. 
Second, very few severity level two changes had been identified. Overall, this chart 
spurred a discussion of severity level definitions. 


SEW Proceedings 


277 


SEL-96-002 






Number of Changes by Severity as of 06 Mar 95 



Figure 14: Initial Evaluation - Number of Changes by Severity Level 


4.1.4 Average Closure Times by Severity Level 

The next set of trends (Figure 15) showed that most problems had historically taken 
between 35 and 40 days to formally close. Unfortunately, changes that were currently 
open at that time had, with the exception of severity level five, been open longer than the 
average of those already closed. This indicates that closure times will likely rise in the 
future. Because difficulty and severity level are not synonymous, we were careful not to 
compare closure times across severity level. 



Figure 15: Initial Evaluation - Average Closure Times 
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Since the maturity data for this development program did not include information about 
which portions of the code the changes/problems related to, we were unable to produce 
change and defect density charts. 

4.1,5 Summary 

Nearly all of the trends pointed to immaturity of the software. In addition, we knew the 
test schedule was consistently being shortened to save time at the tail-end of the 
development. All parties agreed to further study this data on a biweekly basis until the 
test readiness decision in early August 1995. 

4.2 Test Readiness Decision Evaluation - July 1995 

Between the initial evaluation and the test readiness decision, the developer modified 
severity levels of many of the problems to reflect a better understanding of the severity 
level definitions. As a result, software maturity was not as bad a previously thought. 

4.2.1 Weighted and Unweighted Software Changes 

A great deal of progress was made toward closing the backlog (Figure 16, Figure 17, and 
Figure 18). Notice that changes in the slope of the curves are more dramatically shown 
on the weighted chart. For example, between weeks 13 and 17 on Figure 16, we see a 
great deal of progress in closing problems. The trend is more dramatic on the weighted 
chart because the problems were of high severity levels. 



Figure 16: Test Readiness - Weighted Software Change Trends 
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4.2.2 Average Severity Level 

Figure 19 shows the average severity level of recently opened, closed, and remaining 
changes has decreased and remained stable for the last three months. 


Average Severity of ALL S/W Changes as of 28 Jul 95 



Period Number Waakty 


-♦-Total Originated 
a Total Closed 
♦ Remaining 


Figure 19: Test Readiness - Average Severity Level 


4.2.3 Distribution of Changes by Severity Level 

The developer’s better understanding of severity level definitions resulted in a 
distribution of changes that is closer to normal expectations. Unfortunately, one severity 
level one change remains unresolved. This means that execution of some part of the 
software will result in a mission failure or jeopardize safety. 
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Figure 20: Test Readiness - Change Distribution 


4.2.4 Summary 

The software showed signs of improving maturity, but local trends were too short to 
absolutely declare the software mature. For this reason and because of the open severity 
level one problem and the reduced testing schedule, we declare the software not ready for 
test. 

Due to schedule and funding constraints, the system proceeded to the operational testing 
phase despite maturity problems. Although this decision did not follow 
recommendations, we were anxious to see how the results matched with our maturity 
analyses to date. 

4.3 Initial Operational Use - August 1995 

Just days before the first operational test exercise of the software, a new version was 
delivered and checked out on the system. During the first familiarization session of the 
software for field operators rather than system developers, the software worked less than 
40% of the time. This list of work around procedures to software problems grew to over 
100 . 

Finally during the first operational use of the software, it failed dramatically. A software 
failure caused an incomplete safety notification to system users. The system allowed the 
users to bypass the warning and overheat some sensitive electronic equipment. As a 
result, the one-of-a-kind system was out of commission for two months, $1.5 million in 
hardware repairs were required, a new version of software produced and tested, and 
expensive test time was lost until October 1995. 
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4.4 Extended Operational Testing - October 1995 

After the lengthy delay, the system was once again accepted for test. Largely due to this 
delay, the software maturity charts appeared mature. Fortunately, this time the 
operational testing was run to completion. Unfortunately, the system had performance 
problems as well as user interface troubles. In fact, users stated they would, “prefer to 
have the old system back.” Over 100 software deficiencies were identified during one 
month of operational use. Six of these software problems were judged to be severity 
level one and two. Clearly the system, and the software in particular, was not ready for 
fielding. 

4.5 Software Impact on Purchase Decision - March 1996 

After a miserable showing during initial operational testing, developers proceeded to fix 
identified problems prior to the system purchase decision. As a result of preliminary 
findings, the decision to purchase the system was delayed. The system would undergo a 
second round of operational testing to look for improvement. 

As shown in Figure 21, current maturity trends indicate the software has progressed 
toward maturity. We must temper this analysis with an understanding that the scope of 
software testing has been reduced during the period between operational testing and the 
decision to re-test the system. The impact of this reduced testing is a slower rate of 
identifying new changes. As a result, the developers were able to fix most of the 
outstanding changes including all of the severity level one and two changes. 



5. Conclusion 

Software maturity is a simple evaluation to conduct and interpret, yet the information is 
extremely useful for developers, acquirers, and operational testers. The trend charts must. 
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however, be interpreted together as a whole and in the context of external factors such as 
program schedule and requirements stability. As a result, the maturity evaluator must 
have in-depth knowledge of the software development and testing. 

Specifying maturity requirements for release and following through with those decisions 
will help to ensure time and money are not wasted testing immature software, users are 
not disappointed with initial software capabilities, and software maintainers receive 
quality products. In the case study, the software maturity evaluation correctly predicted 
software immaturity. Failure to listen to this advice resulted in millions of dollars in 
repair expenses and wasted test time. 
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READINESS ANALYSIS FOR AN AIR FORCE SYSTEM 


1. Introduction 

Development data from an Air Force software system are analyzed in this section using 
the three-step methodology. The data consist of weighted originated failures and weighted closed 
failures. The weights are 30, 15, 5, 3 and 1 for severity levels 1, 2, 3, 4 and 5, respectively. The 
time period of data is 86 months. 

A brief description of the data is given in Section 2. Guided by the trend statistic curve, 
analyses and maturity assessments are then done at months 70, 75, 80 and 86 in Section 3. A 
summary of the assessments is presented in Section 4. 

2. Data Description 

A graph of the cumulative weighted originated failures, cumulative weighted closed 
failures and weighted failures remaining open is shown in Fig. 21a. These values are called 
change points and thus the data are cumulative Open Change Points (OCP), cumulative Closed 
Change Points (CCP) and Remaining Open Change Points (ROCP). A cursory study of the OCP 
and CCP plots in Fig. 21a indicates very little failure activity for the first twenty-five months. 
Then there is an almost constant rate of increase up to month 60. This is followed by a convex 
curve for OCP and an almost straight line for CCP. The ROCP curve seems to be increasing up 
to month 50 and then remains constant up to month 70. Finally, it shows a decreasing trend up 
to month 86. A better understanding of their behavior can be gained from the Laplace Trend 
Statistics curves in Figs. 21b and 22 for OCP and CCP, respectively. 

Figure 21b indicates a slight reliability decay and them some growth during the first 
twenty months. It is followed by stable reliability indication up to month 27, and reliability 
growth to month 40. Then there are indications of local reliability growth and decay. Starting 
with month 60, there is strong indication of continuing reliability growth up to the present, viz, 
month 86. Figure 22 seems to follow a pattern similar to that of Fig. 21b. In practice, analysts 
track the failure phenomenon and management tries to keep up with the failure curve. In other 
words, as more change points are originated, management tries to ensure that more are closed. 

As mentioned earlier, readiness assessment is a difficult problem. In addition to the open 
and closed curves, it may require consideration of test rate, test completeness and requirements 
stability. Since these items are generally not available, the following assessments are based 
purely on the behavior of the OCP and CCP plots. Reexamining these plots in light of 
observations made above, it would seem that readiness assessment could have started with month 
sixty. However, by month seventy, there is strong indication of sustained reliability growth. In 
the following, the results of assessments at months 70, 75, 80 and 86 are briefly summarized. 
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3. Assessments at Months 70, 75, 80 and 86 

In each case, the Laplace trend statistic curves were studied for total change points, 
originated and closed. These were used as guides for determining the NHPP model choice and 
initial parameter estimates as detailed earlier in this paper. After fitting the appropriate models, 
the best one was selected. The fitted models were then used to estimate the future failure curve 
and the model closure rate (MCR). The average closure rate (ACR) was computed from the 
change points remaining open data. The above values were then used to assess readiness. In the 
analysis given below, the system would be considered ready for release when problems remaining 
open become zero. 

The resulting analyses can be summarized graphically in four figures for each analysis 
month. The first two figures in each case would show fitted NHPP models to open and closed 
data, the third problem closure months for cases 1 and 2 and the fourth problem closure months 
for cases 3 and 4. The figures for each of the analysis months were studied and the results 
analyzed for readiness assessment. Such plots for months 80 and 86 are shown in Figures 23 to 
26 and 27 to 30, respectively. 


Assessment 

Month 

70 

-75 

80 

86 

Case 1 

77.4 

(332) 

84.0 

(340) 

87.2 

(348) 

90.4 

(349) 

Case 2 

78.0 

(305) 

87.2 

(254) 

90.5 

(238) 

94.1 

(191) 

Case 3 

83.4 

(332) 

91.3 

(340) 

91.5 

(348) 

92.3 

(349) 

Case 4 

84.9 

(305) 

98.0 

(254) 

98.3 

(238) 

98.8 

(191) 


4. Summary of Assessments 

The above table summarizes the results of various analyses at months 70, 75, 80 and 86. 
It gives the failure closure month (month all remaining open failures are closed) for each 
assessment month and for each of the four cases. The corresponding values of ACR and MCR 
are given in parentheses. Thus for case 1 at month 70, the average failures closure rate is 332 
per month and all currently open failures should be resolved -by month 77.4. For case 4, month 
80, the model based closure rate is 238 per month and current unresolved failures and the failures 
to be detected should be resolved by month 98.3. A graphical representation of these results is 
shown in Figure 31. 
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Some observations from above table are summarized below. 


Case 1. 

This represents the situation when no new detected failures are assumed and the average 
closure rate (ACR) is used to close the remaining open problems. For this data set, the 
ACR is almost constant. The changes in the month to reach zero remaining open problem 
in each assessment month is due to the additional new failures detected from the previous 
assessment month. 

Case 2. 

The model closure rate in this case is decreasing for each successive assessment month 
because of the decreasing closure rate. It would take longer to resolve the open faults 
than for case 1 for each respective assessment month. 

Case 3. 

Compared to case 1 (which also assumes an average closure rate) this case explicitly 
accounts for the extra time required to resolve the failures to be detected in future months. 
This is a more realistic situation than case 1 would represent. 

Case 4. 


Just as in Case 2, the closure rate is decreasing for each successive assessment month. 
Hence it would take longer to resolve the problems remaining open than in case 3 for 
each respective assessment month. 
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Change Points 



Figure 21a: Accumulated software changes 
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Figure 22: Trend test for closed data 
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Change Points x 1 0^ 



Figure 25: Readiness analysis at month SO not accounting for new faults 


Change Points x 10 3 



Figure 26: Readiness analysis at month SO accounting for new faults 
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Figure 28: Closed d 
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Change Points x 1CP 



Figure 29: Readiness analysis at month 86 not accounting for new faults 


Change Points x 1(P 



Figure 30: Readiness analysis at month 86 accounting for new faults 
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Problem Closure Month 



Figure 31: Graphical representation of readiness assessments at months 70, 75, 80 and 86 
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Assessment of an Air Force 

Study 

A. Goel, Syracuse University 
B* Hermann, U.S. Air Force 
R. McCann e, U.S* Air Force 


Outline 



■ Summary of Operational Test Readiness Problem 

■ Current AFOTEC Approach 

■ New Approach - Air Force System Case Study 

■ Advantages/Limitations 

■ Where to Next . . . 


Summary of Operational Test 
Readiness Problem 



■ AFOTEC operationally tests systems to ensure 
they meet user requirements 

■ Operational testing is very expensive (especially for 
embedded systems) 

■ Need a method for determining if system is ready 
for operational testing 

- Use templates (checklists) 

- Software maturity evaluation 

» CURRENT STATUS ONLY 
» NEED ABILITY TO PREDICT 


Current AFOTEC Approach 



■ Software Maturity - progress software products 
are making toward meeting user requirements 

■ Use software problem/change report data and 
categorize by severity level 

- Change Points = Severity Weight * Number of Changes 

■ Limited to current status with only crude 
“straight-line” estimates 
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Current AFOTEC Approach 



Current AFOTEC Approach 



■ Backlog Chart 
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Current AFOTEC Approach 



Demonstration of Maturity Evaluation Shortcomings 

■ Embedded Air Force System 

■ Completely hopeless initial evaluation 

■ Improving test readiness evaluation - but not yet 
acceptable 

■ Decision makers proceeded with operational 
testing 

- Found lots of BIG problems — Required new software build 

- $1.5 Million damage to system 

- 2 Month Test Delay 


Current AFOTEC Approach 



Problems With Current Approach 

■ Too often ignored by senior decision makers 

■ Lacks the ability to reasonably predict future 
maturity status 
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New Approach - AF Case Study 



Three Step Method 

1. Statistical Trend Analysis 

2. Reliability Growth Modeling 

3. Readiness Evaluation 


Statistical Trend Analysis 



Assess Trends in Data 

■ LaPlace Trend (LT) Test 

- Widely studied, applied to $/w reliability problem 

- UMP unbiased test when paired with some NHPP models 

■ Indicates stable, increasing, or decreasing failure 
rate trend 


SEW Proceedings 


301 


SEL-96-002 



Reliability Growth Model 

IS 

■ Used to estimate the impact of undiscovered faults 

■ Also can be used to estimate the closure rate 

■ Two common problems . . . 

- Model Selection 

- Estimation of Model Parameters 


Readiness Evaluation 


Four Cases 
■Case 1 

- No new failures 

- Average Closure Rate 
(ACR) 


■Case 2 

- No new failures 

- Model-based Closure 
Rate (MCR) 


■Case 3 

- New failures according to 
RGM 

- Average Closure Rate 
(ACR) 


■Case 4 

- New failures according to 
RGM 

- Model-based Closure 
Rate (MCR) 
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Post-Mortem Results 
for an Air Force System 



Assessment 

Month 


70 

75 

80 

86 

Case 1 

77.4 

84.0 

87.2 

90.4 

(ACR) 

(332) 

(340) 

(348) 

(34?) 

Case 2 

78.0 

87.2 

90.5 

94.1 

(MCR) 

(305) 

(254) 

(238) 

(191) 

Case 3 

83.4 

91.3 

91.5 

92.3 

(ACR) 

(332) 

(3401 

(348) 

(349) 

Case 4 

84.9 

98.0 

98.3 

98.8 

(MCR) 

(305) 

(254) 

■ ( 23 ?) - 

_i 121L- 


Advantages/Limitations 



■Advantages 

- Provides an objective and systematic framework for analytically 
performing readiness assessments 

- Can be adapted to be consistent with current AFOTEC 
approach 

■ Limitations 

- Assumptions must represent actual development environment 

- Practical use requires a good understanding of underlying 
theoretical framework 

- Requires tool support to perform necessary analyses 
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Conclusions 



■ Proposed a method of three iterative steps for 
conducting assessments to determine software 
readiness for dedicated OT&E 


■ Methodology explicitly uses trend test and 
reliability models for decision making 

■ It extends current AFOTEC approach 

- considers undetected faults 

- provides two estimates of fault closure rate 
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Questions? 
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Session 5: Case Studies 



Risk Knowledge Capture in the Riskit Method 
J. Kontio and V. Basil!, University of Maryland 


Requirement Metrics for Risk Identification 
T. Hammer and L. Hyatt, NASA Goddard, W. Wilson, L. Huffman, and L. 
Rosenberg, Software Assurance Technology Center 


Applying the SCR Requirements Specification Method to Practical Systems: A 

Case Study 

R. Bharadwaj and C. Heitmeyer, Naval Research Laboratory 
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Risk Knowledge Capture in the Riskit Method 


Jyrki Kontio and Victor R. Basili 
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University of Maryland 
Department of Computer Science 
A. V. Williams Building 
College Park, MD 20742, U.S.A. 
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Abstract 

This paper describes how measurement data and experience can be captured for risk 
management purposes. The approach presented is a synthesis of the Riskit risk 
management method and the Experience Factory. In this paper we describe the main 
goals for risk knowledge capture and derive a classification of information based on 
those goals. We will describe the Riskit method and its integration with the 
Experience Factory. We will also outline the initial experiences we have gained from 
applying the proposed approach in practice. 


1. Introduction 

Unanticipated problems frequently cause major problems to projects, such as cost overruns, 
schedule delays, quality problems, and missing functionality. To some degree these problems can 
be seen as signs of immaturity of our field and we should expect some improvements in our 
discipline as our methods and knowledge improve. However, as each software development 
project involves at least some degree of uniqueness and our technology changes continuously, 
uncertainty about the end results will always accompany software development. While we cannot 
remove risks from software development, we should learn to manage them better. 

Ability to capture, analyze and package experience is a prerequisite for systematic, planned 
improvements in software engineering [2], as in any field. The framework proposed in this paper 
builds upon the Riskit method and the Experience Factory, both developed at the University of 
Maryland. The proposed risk knowledge capture framework contains templates for capturing data 
about risk elements, templates for capturing relevant information about the risk management 
process, definition of where in the risk management process risk management knowledge is 
captured and utilized, and a proposed model for improvement goals for risk management. 

2. Background 

Risks in software development were not addressed in detail until late 1980’s when Boehm [6] 
proposed and synthesized an approaches for software risk management. His work was 
complemented by Charette [9], and on these foundations recent advances in software risk 
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management have produced well-documented approaches for risk management [14,18,24,26], 
several categories of risks have been identified [6,8,23], quantitative approaches for risk 
management have been proposed and used [5,7,1 1], and there are several software tools available 
for risk management. Furthermore, most commonly used software engineering standards [15,16] 
or assessment frameworks [17,27] require at least some form of risk management to take place. 

Despite these efforts and the obvious industry interest in risk management, it seems that few 
organizations apply specific risks management methods actively [28]. The limited survey data 
from a recent workshop by Basili and Koji Tori supports this observation: only 20% of 
respondents claimed to use risk management techniques ‘fextensively” while 40% stated that they 
are not using ‘hny risk management techniques or approaches” [19]. Clearly, the industrial 
practice of risks management methods has not yet reached its full potential. 

There is little reported work on utilizing data and experience from past project in software 
engineering risk management literature. Some aspects of Boehm’s work implicitly assumed that 
data from past projects is available if simulation and cost models are used for estimating risks [6]. 
He also mentioned factors of cost models as possible risk monitoring metrics. Charette has 
presented an outline of items that should be defined for a project to initiate risk management [10]. 
He has also given examples of what should be measured and how this data can be graphed for risk 
management purposes. However, neither one of these approaches can be considered a systematic 
way to capture or utilize risk management experience. 

The Software Engineering Institute (SEI) has collected data from risk assessments they have 
carried out during the last few years. Their goal seems to be to support analysis risks and their 
relationships using lexical analysis on the qualitative descriptions in the database [25], It also 
seems that frequencies of risks in the database have been used to indicate what are the most 
common risks. To our knowledge, this database focuses on the results of risk assessments and 
contains little or no data of what actually happened in projects. Also, it is not clear how much 
context information is captured about risks and projects so that information in the database can be 
utilized more effectively. 

Hall has defined and implemented a risk database while working at Harris corporation [12], 
Risks from three projects were collected [13] and used for analysis in evaluating Hall’s risk 
management maturity model. Hall has also collected survey data on the levels of risks 
management practices in various organizations [12]. 

There have been several other, less formal approaches in documenting information about 
software risks. The ACM SIGSOFT Software Engineering Notes has run a long series of reports 
on computer related problems or disasters. However, such a list is not very useful for analyzing 
risks of an individual projects as most of the reported risks do not contain enough context 
information and details to be useful. 

In summary, it seems that while several some advances have been made in the area of software 
risk knowledge capture, none of the reported approaches provide a comprehensive framework for 
capturing risk knowledge. Furthermore, software risk management data and knowledge is rarely 
systematically collected and utilized in the industry. We hope that the framework proposed in this 
paper can act as a step towards more systematic risk knowledge capture so that our 
understanding of risks and risk management methods can improve. 
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3. Risk Knowledge Capture 

We have identified three generic types of goals for risk knowledge capture: monitoring risks, 
understanding risks, and risk management process improvement. First, the risk situation in a 
project needs to be monitored so that appropriate risk controlling action can be taken. Second, we 
need to collect information about risks so that frequencies of occurrence and losses of risks can be 
estimated better. Finally, information needs to be collected so that the risk management process 
itself can be improved. 

Each of the three goals described above focus on different kinds of information and, as always 
in measurement, the individual metrics and data collection procedures may vary between 
situations. However, we have identified some generic classes of information based on these three 
goals. This risk information classification will be introduced in the following paragraphs. 

Project context information refers to such information that determines the circumstances and 
setting where the project is carried out. Project context information is relevant for all software 
engineering measurement data, but it is particularly important for risk management. The 
probability of a risk event is often influenced by many factors. By capturing as much as possible of 
the risk management context information we make it easier to interpret risk management data in 
the future. 

The risk management infrastructure information defines what risk management methods, 
techniques, tools, processes and approaches are used for in risk management. The risk 
management infrastructure can also be extended to include several other organizational issues that 
marginally influence risk management, as proposed by Hall [12]. In fact Hall’s framework can be 
used as a model to document the state of risk management infrastructure in an organization. 

The project information defines the project itself and it includes the definition of the goals, 
customers, schedule, and constraints of the project. It also includes the definition of the risk 
management mandate for the project: the risk management mandate is a project-specific statement 
of the scope of risk management in a project. 



Risk monitoring 

Understanding 

risks 

Risk management 
process improvement 

Project context information 

X 

X 

X 

Risk management infrastructure 
information 



X 

Project information 


X 

X 

Enactment data 

X 

X 

X 

Risk management process 
information 



X 

Risk element information 

X 

X 

X 


Table 1: The relationships between risk knowledge capture goals and risk information 
types 
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While the project information provides a static view to the project, enactment data provides 
the dynamic perspective to the project: how much effort is spent, what artifacts are produced and 
when, how much time has passed, and which individuals worked on the project. Enactment data is 
usually collected for project control and experience capture purposes as a part of software 
engineering measurement program. 

The risk management process information describes the activities and events related to risk 
management in the project. The risk management process information is, in fact, a special case of 
project information, but as it represents our special focus, it is meaningful to separate it from the 
general enactment data of the project. 

Finally, risk element information refers to information about risks in a project. This type of 
information can include descriptions of factors that influence risks, such as methods, tools, 
resources; events that may influence the project; or impacts that risks might have. As we will 
discuss later, the Riskit method contains conceptual tools to structure such information more 
formally than is usually done. 

The relationships between risk knowledge capture goals and risk information types is presented 
in Table 1. Each row in Table 1 represents a risk information type and each column a risk 
knowledge capture goal. An ‘X” in a cell indicates that the goal in that row normally needs to 
utilize the type of information listed in that row. However, it is important to point out that 
information from other categories may often be needed as well. Table 1 merely represents what 
we believe to be typical relationships between goals and information types. 

4. Towards a Risk Knowledge Capture Framework 

4.1 The Riskit Method 

The Riskit method has been developed to support systematic risk analysis. The Riskit method 
uses a graphical formalism to support qualitative analysis of risk scenarios before quantification is 
attempted, its risk ranking approach can be selected based on the availability of history data or 
accuracy of estimates, it supports multiple goals and stakeholders, and its risk ranking approach is 
based on the utility theory [20], We have presented an overview of the activities in the Riskit 
process in Figure 1 . More information about the method is available in separate reports [20-22]. 

A central part of the Riskit method is the graphical formalism used to document risks, the 
Riskit analysis graph. The Riskit analysis graph is used to define the different aspects of risk 
explicitly and more formally than is done in casual conversation. The Riskit analysis graph is used 
during the Riskit process to decompose risks into clearly defined components, risk elements. Its 
components are presented in Figure 2. Each rectangle in the graph represents a risk element and 
each arrow describes the possible relationship between risk elements. We will define the 
components of the graph in the following paragraphs. 
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Instead of informal, general descriptions of 
risks, we can document the different aspects of 
risks more precisely, as is shown in Figure 2. 
The Riskit analysis graph allows explicit and 
more formal documentation of risks and risk 
scenarios. 

The Riskit method has several potentially 
useful characteristics that can support risk 
knowledge capture. First, the Riskit Analysis 
Graph enforces more formal definition of risks 
so that more information is collected about each 
risk. Second, the graphical formalism used as 
well as the tool that is used to draw these 
diagrams lay the foundations for automating 
some of the risk knowledge capture: 
information about risks can be captured as 
Riskit graphs are drawn. Third, the Riskit 
process itself is a defined process that increases 
repeatability of the risk management process 
and supports the collection of relevant risk 
management experience through the templates 
and guidelines included in the method. 

4.2 Risk Knowledge Capture in the 
Experience Factory Framework 



objectives, 

expectations, 

constraints, 



Figure 1: The Riskit risk management cycle 1 


In this section we present how the Riskit 

method can be integrated into Basili’s Experience Factory (EF) and Quality Improvement 
Paradigm (QIP) [3,4]. The Quality Improvement Paradigm (QIP) is a systematic process for 
continuous improvement. It is similar to the scientific principle of learning in its emphasis of 
learning through empirical experience. The QIP process can be seen as consisting of three main 


may influence 



Figure 2: A conceptual view of the elements in the Riskit analysis graph 


1 Note that Figure 1 presents a simplified view of the activities in the Riskit process. More comprehensive 
description of the Riskit process is available through other publications [20]. 
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activities that include the six steps normally described for QEP: planning , consisting of the steps 
characterize, set goals, and choose process; execute-, and learning, consisting of steps analyze and 
package [4], 

The Experience Factory Organization is an organizational model for implementing the QIP 
process. The main idea of this approach is the recognition the distinct roles belonging to the 
project organization and a learning organization, the Experience Factory. The Project 
Organization focuses on delivering the software product and the Experience Factory focuses on 
learning from experience and improving software development practice in the organization. A 
central aspect of the Experience Factory is the Experience Base, a repository of data and 
knowledge about the software development process and products. The knowledge in the 
Experience Base can be in various forms, it can include raw and summarized data, mathematical 
models about the data (e.g., prediction models), experiment reports, and qualitative lessons 
learned reports [1-4], 

From risk management perspective the Experience Factory concept serves to fulfill the 
following goals: 

• separation of responsibilities between risk management within projects and improving the 
risk management process itself and improving the understanding of risks; 

• systematic capture and accumulation of risk management knowledge into the Experience 
Base; 

• continuous learning from risk management experience through measurement, data 
collection, analysis and synthesis; and 

• systematic reuse of accumulated risk management knowledge through packaging and 
dissemination of this knowledge. 

When the Riskit process is viewed from the perspective of the Experience Factory and the QIP 
cycle, it is possible to identify steps where risk management process needs to be initiated to 
support the QIP process, as shown in Figure 3. The initial planning cycle represents the first cycle 
of the Riskit process, whereas the risk management cycle supporting the execute step support 
mainly project monitoring, i.e., risk monitoring and control. The learning step analyzes and 
packages the risk management experience gained through the process. 

All of the QIP and Riskit activities represented in Figure 3 produce data about risk 
management that can be captured and stored in an experience base. We have defined a database 
definition for such information for the Riskit process. Furthermore, the project planning step in 
QIP also includes goal definition for risk understanding and risk management process 
improvement. These goals can introduce new data and experience capture needs that can be 
implemented as required. The learning step of QIP, and the two risk related activities associated 
with it, utilize the data and experience collected about risks and produce packaged, reusable 
pieces of risk knowledge to be stored in the Experience Base and utilized in future projects. 
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Figure 3: The mapping between QIP cycle and the Riskit process 

4.3 Applying the Riskit Knowledge Capture Framework 

The Riskit method and its knowledge capture framework have been applied in several trial 
projects. So far the case studies have focused on the last one of the goals we introduced earlier: 
improving the method itself 

The goals of the first case study [22] were to characterize the method, investigate its feasibility, 
and to collect empirical feedback on its use to be able to improve it. This first case study resulted 
in several changes in the method itself and it produced approximately 15 risk scenarios 
(corresponding to about 50 risk elements). Project and context information was documented 
informally in a separate report [22], Other, on-going empirical studies with the method focus 
similarly on obtaining feedback on the methods feasibility and effectiveness. 

These case studies have produced large amounts of risk management data and experience and 
we are in the process of formalizing this data into a risk management database, or a risk 
management experience base. Our goal is to evaluate the feasibility and potential benefits of such 
a database given the empirical data we have obtained. 


SEW Proceedings 


315 


SEL-96-002 



5. Conclusions 


This paper presented background and motivation for risk knowledge capture and proposed a 
classification of goals and information types for such capture. We also outlined how the Riskit 
method supports this type of experience capture. We reported some initial experiences from the 
use of the Riskit method and the proposed risk knowledge capture framework. 

The potential benefits from v risk knowledge capture are significant. Frequency and severity of 
typical risks can be estimated more accurately, changes in potential risks observed more 
concretely, risk management methods and tools can be improved based on empirical feedback, 
and projects have more up-to-date information about risks and risk management actions in a 
project. Furthermore, it may be possible to identify and package some risk management patterns: 
reusable pieces of risk management knowledge that can be utilized by project managers. Examples 
of such risk patterns could be lists of risks that are associated with certain project characteristics 
and descriptions of risk controlling actions that have been found effective in controlling certain 
types of risks. The Riskit method itself, through its more formal definition of risk and its graphical 
representation formalism, provides a good basis to capture and reuse such knowledge in practice. 

While it is too early to make any conclusions about the feasibility and benefits of the proposed 
risk knowledge capture approach, the combination of Riskit and the Experience Factory contain 
the necessary foundations for more systematic and detailed experience capture. The initial 
empirical studies indicate that the approach is feasible in industrial context. 

However, it is yet to be determined whether such experience capture is cost effective. 
Although the Riskit method may potentially allow automation of some of the experience capture 
processes, it is currently a manually driven process and therefore potentially too costly in large 
scale use. Furthermore, given the subjective nature of the definition of risk, one could also 
question how reliable is experience that, to a large degree, is based on subjective opinions and 
judgment calls about future events. 

While there may be some valid concerns about the cost-effectiveness of a risk management 
database and its utilization, it is nevertheless likely that risk management experience needs to be 
captured and formulated into knowledge to be reused in future projects. The Riskit method 
provides a more concrete basis even for qualitative knowledge formulation process, even when 
the risk management experience and data are not captured into a formal database but stored in 
less formal parts of the Experience Base. 
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■ Definition of risk 



■ The Riskit method 

♦Underlying principles 

♦Riskit process through an example 

■ Case studies 


■ Conclusions 


SEW Proceedings 


319 


SEL-96-002 


Definitions of Risk 

Risk: a possibility of loss — or 
any characteristic, object or 
action that is associated with 
that possibility. 

Risk is associated with: 



- probability: there is uncertainty 

- loss: some harm or damage 

• goals or expectations 

• stakeholder 


is defiled by 


Expec- 

tation 


belongs to 


■ Risk management refers to a systematic and 
explicit approach used for identifying, 
analyzing and controlling risk. 


Stake- 

holder 


Riskit Main Principles 

• Risks are relative to goals and 

expectations 

• There’s always more than one 
stakeholder 

• Risks must be well defined 

• Multiple goal effects are accounted for 

• Losses estimated through utility loss 

• Learn from past experience 
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The Riskit 
Process 


constraints, 
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Example 

• This presentation 

• Stakeholders 

- Audience 

- Presenter 

- Session chair 

• Goals 

- Learn about risk management 

- Finish in 30 minutes 

- Sell Riskit to practitioners 


Example: Review and Definition of 


Goa] 

Is 

Goal 

Stakeholders 

Metrics 

Target 

Learn about 
risk mgmt 

• Audience 

• Feedback 

• Questions asked 

• Use of Riskit? 


Finish in 30 
mins 

• Audience 

• Session chair 

• Elapsed time 

30 minutes 

“Sell” Riskit 

• Presenter 

• Feedback 

• Questions asked 

• Inforequests 

• WWW visits... 

Some will 
try it out 


“Risks are relative to 
goals and 
expectations” 


“There’s always 
more than one 
stakeholder” 
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Review/ 

define 

goals 


objectives, 



Example: Risk Identification 

• Possible risks: 

- Talk will last longer than 30 minutes 

- On line slide presentation system fails 

- Presenter will mess up his slides 

- Too many questions at the end 

- Presenter will ramble off the topic 

- Audience does not have much background in 
risk management 

- Booster rockets from the space shuttle hit this 
building 
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Example: Risk Identification 

■ Selected risks for risk analysis: 

- Talk will last longer than 30 minutes 

- On line slide presentation system fails 

- Presenter will mess up his slides 

- Too many questions at the end 

- Presenter will ramble off the topic 

- Audience does not have much background in risk 
management 

- Booster rockets from the space shuttle hit this building 



objectives, 

expectations, 

constraints, 
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Risk 

analysis 

example 
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Ranking Risk Effects 


“Losses estimated 
through utility loss” 


Stakeholders: 

Effects: 

Audience 

Presenter 

Session Chair 

Poor learning 
Poor sale 

■ 

*■ 

Low 

Time exceeded 

Med 

Low 

m 

Fair learning 

Med 

■ 

Low 

Poor learning 
Fair sale 

m 

Med 

Low 


Example: 

Selecting the 
scenarios 


Presenter 

Loss 

High 

Loss 

Med 

Loss 

Low 

Prob High 

Scenario 1 


Scenario 2 

Prob Med 

Scenario 3 



Prob Low 


Scenario 4 



Audience 

Loss 

High 

Loss 

Med 

Loss 

Low 

Prob High 

Scenario 1 

Scenario 2 


Prob Med 


Scenario 3 


Prob Low 

Scenario 4 




Chair 

Loss 

High 

Loss 

Med 

Loss 

Low 

Prob High 

Scenario 2 


Scenario 1 

Prob Med 



Scenario 3 

Prob Low 



Scenario 4 


1 

II 

III 

II 

III 

IV 

III 

IV 

V 
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objectives, 

expectations, 

constraints. 



Risk Control Planning 


Presenter’s priorities: 

- Scenario 1 

- Scenario 3 

- Scenario 2 

- Scenario 4 

Audience’s priorities: 

- Scenario 1 

- Scenario 2 

- Scenario 3 and 4 

Chair’s priorities 

- Scenario 2 

- Scenario 1 

- Scenario 3 

- Scenario 4 

0 4» D ee-96 



■ Joint risk control 
for Scenario 1 and 
Scenario 2 

■ Scenario 3 is 
presenter’s problem 
(and so is scenario 4) 
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Risk Control Planning for 



• Test the on line ■ Bring back up slides 

presentation system for overhead 
thoroughly 


Risk Control Planning for 
Scenario 2 



t 

• Test the on line 
presentation 
system 
thoroughly 



Have a back 
up system 
ready 


i Bring back 
up slides for 
overhead 
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Risk Control Planning for 
Scenario 3 


♦ 

• Provide references for further information 

• Hang around after the talk 




Risk Management Experience 
• Goals Capture 


-Risk management process improvement 


- Risk understanding 

- Risk monitoring 

• Means 

- Risk management Experience Base 



- Risk management experience analysis 
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Risk Management Experience Base 



Controlling 

action 


Risk event 


I 


Risk reaction 


Effect 

(set) 


Utility loss N* 


Risk scenario 


^ Goal metric fe* 


Project 

Context 


Risk 

management 

infrastructure 



Empirical Studies 

• SEL Case Study 

- exploratory study to support method 
development 

• Hughes Case Study 

- exploratory study on method use 

- describe the method, assess feasibility, 
compare effectiveness 

-Produced 4 stakeholders, 17 goals and 48 risks 
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Case Study Experiences 

• Riskit results in more detailed description 
and analysis of risks 

• Method users gave high marks for Riskit for 

- “Well-defined process, usable and practical” 

- “Provides a high-level view of all risks” 

- “More confidence in results, more thorough, more 
complete analysis” 

• Identified risks that normal approach might 
have ignored 

• Riskit consumed more resources 


• Benefits Conclusions 

- avoids common limitations in risk management 
(multiple goals and stakeholders, risk ranking) 

- explicit and precise description of risks 

- increases user confidence in results 

- captures risk management experience 

• Potential problems 

- higher cost 

• Further work 

- case studies continue (e.g. Nokia Corporation) 

- potential automation for graphs and database 
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1. Introduction 

The Software Assurance Technology Center (SATC) is part of the Office of Mission 
Assurance of the Goddard Space Flight Center (GSFC). The SATC’s mission is to assist 
National Aeronautics and Space Administration (NASA) projects to improve the quality of 
software which they acquire or develop. The SATC’s efforts are currently focused on the 
development and use of metric methodologies and tools that identify and assess risks 
associated with software performance and scheduled delivery. This starts at the requirements 
phase, where the SATC, in conjunction with software projects at GSFC and other NASA 
centers is working to identify tools and metric methodologies to assist project managers in 
identifying and mitigating risks. This paper discusses requirement metrics currently being 
used at NASA in a coUaborative effort between the SATC and the Quality Assurance Office at 
GSFC to utilize the information available through the application of requirements management 
tools. 

Requirements development and management have always been critical in the implementation 
of software systems - engineers are unable to build what analysts can not define. Recently, 
automated tools have become avaUable to support requirements management. The use of 
these tools not only provides support in the definition and tracing of requirements, but also 
opens the door to effective use of metrics in characterizing and assessing risks. Metrics are 
important because of the benefits associated with early detection and correction of problems 
with requirements; problems not found until testing are at least 14 times more costly to fix 
than problems found in the requirements phase. This paper discusses two facets of the 
SATC’s efforts to identify requirement risks early in the life cycle, thus preventing costly 
errors and time delays later in the life cycle. 

The first effort that will be discussed is the development and application of an early life cycle 
tool for assessing requirements that are specified in natural language. This paper describes the 
development and experimental use of the Automated Requirements Measurement (ARM) 
tool. Reports produced by the tool are used to identify specification statements and structural 
areas of the requirements document which need to be improved. 

The second effort discusses metrics analysis of information in the requirements database used 
to provide insight into the stability and expansion of requirements. The research into 
attaching certain document attributes to analyses results done on requirements stored in 
requirements databases is providing project management with valuable information. The 
correlations between document structure and language, and requirement expansion and testing 
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have been strong. This information has been assisting and continues to assist the Quality 
Assurance Office in its project oversight role. 

When discussing metric results the project must remain anonymous; however, for this paper, a 
general understanding of the project’s development environment is necessary. The project in 
discussion is implementing a large system in three main incremental builds. 1 The development 
of these builds is overlapping, e.g. design and coding of the second and third builds started 
prior to the completion of the first build. Each build adds new functionality to the previous 
build and satisfies a further set of requirements. The definition of requirements for this system 
started with the formulation of System Level Requirements. These are mission level 
requirements for the space craft and ground system; they are at a very high level and rarely, if 
ever, change. Requirements at this level will not be discussed since they are not stored in the 
requirements database under scrutiny. 

System requirements then undergo several levels of decomposition to produce Top Level 
Requirements. These requirements are also high level and change should be minimal. The 
development of the project discussed in this paper started with the Top Level requirements. 
Top Level requirements are then divided into subsystems and a further level is derived in 
greater detail; hence, “Specification Requirements”. Generally, contracts are bid using this 
level of requirement detail. The Design Requirements are derived from the Specification 
requirements; these requirements are the ones used to design and code the system. This 
project chose to develop an additional intermediate set of Specification Level Requirements 
after contract award. 

2. Automated Requirements Measurement Tool (ARM) 

Despite the significant advantages attributed to the use of formal specification languages, then- 
use has not become common practice. Because requirements that the acquirer expects the 
developer to contractually satisfy must be understood by both parties, specifications are most 
often written in natural language. The use of natural language to prescribe complex, dynamic 
systems has at least three severe problems: ambiguity, inaccuracy and inconsistency. Many 
words and phrases have dual meanings which can be altered by the context in which they are 
used. Weak sentence structure can also produce ambiguous statements. For example, the 
statement “Twenty seconds prior to engine shutdown anomalies shall be ignored.” could result 
in at least three different implementations. Defining a large, multi-dimensional capability 
within the limitations imposed by the two dimensional structure of a document can obscure 
the relationships between individual groups of requirements. 

The SATC developed the Automated Requirements Measurement (ARM) tool to address 
certain management needs: that of providing metrics which NASA project managers can use 
to assess the quality of their requirements specification documents and that of identifying risks 
poorly specified requirements introduce into any project. The ARM tool searches the 
requirements document for terms the SATC has identified as quality indicators. Reports 
produced by the tool are used to identify specification statements and structural areas of the 

1 Various names are used, deliveries, releases, builds, but the term build will be used in this paper. 
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requirements document which need improvement. It must be emphasized that the tool does 
not assess correctness of the requirements specified; it does, however, assess the structure, 
language, and vocabulary of both the document itself and the individual requirements. 

2. 1 Specification Quality Attributes 

The SATC study was initiated by compiling the following list of quality attributes that 
requirements specifications are expected to exhibit: Completeness, Consistency, Correctness, 
Modifiability, Ranking, Traceability, Non-ambiguity, and Verifiability. As a practical matter, 
it is generally accepted that requirements specifications should also be Valid and Testable. 
These characteristics are not independent. A specification, obviously, cannot be correct if it is 
incomplete or inconsistent. 

Most, if not all, of these quality attributes are subjective. A conclusive assessment of a 
requirements specification’s appropriateness requires review and analysis by technical and 
operational experts in the domain addressed by the requirements. Several of these quality 
attributes, however, can be linked to primitive indicators that provide some evidence that the 
desired attributes are present or absent. 

2.2 Specification Quality Indicators 

Although most of the quality attributes of documented requirements are subjective, there are 
aspects of the documentation which can be measured and therefore can be used as indicators 
of quality attributes. Nine categories of quality indicators for requirement documents and 
specification statements were established for two types of classification: those related to the 
examination of individual specification statements, and those related to the requirements 
document as a whole. The categories related to individual specification statements are: 
Imperatives, Continuances, Weak Phrases, Directives, and Options. The categories of 
indicators related to the entire requirements document are: Size, Specification Depth, 
Readability, and Text Structure. 

• IMPERATIVES are those words and phrases that command that something must be 
provided. “Shall” normally dictates the provision of a functional capability; “Must” or 
“must not” normally establishes performance requirements or constraints; “Will” normally 
indicates that something will be provided from outside the capability being specified. The 
ARM report lists the imperatives and their associated counts in descending order of 
forcefulness. An explicit specification will have most of its counts high in the report 
IMPERATIVE list (i.e. shall, must, required). 

• CONTINUANCES are phrases such as “the following:” that follow an imperative and 
precede the definition of lower level requirement specification. The extent that 
CONTINUANCES are used is an indication that requirements have been organized and 
structured. These characteristics contribute to the tractability and maintenance of the 
subject requirement specification. However, extensive use of continuances indicate 
multiple, complex requirements that may not be adequately factored into development 
resource and schedule estimates. 
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• WEAK PHRASES are clauses that are apt to cause uncertainty and leave room for 
multiple interpretations. Use of phrases such as “adequate” and “as appropriate” indicate 
that what is required is either defined elsewhere or worst, the requirement is open to 
subjective interpretation. Phrases such as “but not limited to” and “as a minimum” 
provide the basis for expanding requirements that have been identified or adding future 
requirements. WEAK PHRASE total is indication of the extent that the specification is 
ambiguous and incomplete. 

• DIRECTIVES are words or phrases that indicate that the document contains examples or 
other illustrative information. DIRECTIVES point to information that makes the specified 
requirements more understandable. The implication is the higher the number of Total 
DIRECTIVES the more precisely the requirements are defined. 

• OPTIONS are those words that give the developer latitude in the implementation of the 
specification that contains them. This type of statement loosens the specification, reduces 
the acquirer’s control over the final product, and establishes a basis for possible cost and 
schedule risks. 

• LINES OF TEXT are the number of individual lines of text read by the ARM program 
from the source file. 

• UNIQUE SUBJECTS is the count of unique combinations and permutations of words 
immediately preceding imperatives in the source file. This count is an indication of the 
scope of the document. The ratio of unique subjects to the total for SPECIFICATION 
STRUCTURE is also an indicator of the specifications’ detail. 

• READABILITY STATISTICS are a category of indicators that measure how easily an 
adult can read and understand the requirements document. Flesch-Kincaid Grade Level 
index is also based on the average number of syllables per word and the average number 
of words per sentence. (For the project of this paper, the score indicates a grade school 
level.) 

Table 1 below shows the summary statistics for 41 NASA requirement documents and the 

results for the project discussed in this paper. Project X. 


41 DOCUMENTS 

Lines of Text . Count of 
the physical lines of text 

Imperatives • shall, must, 
will, should, is required to, are 
applicable, responsible for 

Continuances - as follows, 
following, listed, inparticular, 
support 

Weak Phrases - adequate, 
as applicable, as appropriate, 
as a minimum, be able to, be 
capable, easy, effective, not 
limited to, if practical 

Directives, figure. table, 
for example, note: 

Options - can, may, 
optionally 

Subjects > Count fo unique 
identifiers preceding 
imperatives 

Flesch-Kincaid Grade 

Lvl - Readability Index 

Median 

927.0 

192.5 

70.5 

24.0 

13.0 

20.0 

76.0 

11.4 

Mean 

1,569.9 

415.1 

155.4 

41.0 

25.0 

39.6 

173.5 

10.8 

Max 

7,499.0 

2,004.0 

1,023.0 

249.0 

119.0 

169.0 

804.0 

13.8 

Mm 

36.0 

25.0 

8.0 

0.0 

0.0 

0.0 

12.0 

7.8 

Stdev 

1,758.3 

468.7 

202.2 

51.3 

28.7 

45.4 

203.2 

1.6 


Project X \ 11,596 1 1,982 1 620 1 374 | 132 | 177 | S10 | 9 


Table 1 : Summary Statistics 
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Two approaches can be applied to compare Project X to the metric database containing 41 
documents. The first approach is to compare Project X to the other projects using standard 
deviations. Since approximately 99% of the projects should fall within +/- 3 standard 
deviations, we mark that range on the graph in Figure 1. 



Figure 1: Document Attributes by Standard Deviation 

However, since Project X is larger than all projects analyzed to date. Figure 1 may present an 
inaccurate picture. Normalizing the data on Lines of Text (Figure 2) yields a different picture 
of Project X in relation to other projects, thus suggesting that Project X attribute counts are in 
line. The number of weak phrases should however be investigated since it indicates potential 
risk. 
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Figure 2: Document Attributes Normalized by Lines of Text 


The structure of the document is also indicative of potential project risks. ARM uses the 
structure depth and specification depth to depict two aspects of the document’s structure. 

• STRUCTURE DEPTH provides a count of the numbered statements at each level of the 
source document. These counts provide an indication of the document’s organization and 
consistency and level of detail. High level specifications will usually not have numbered 
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statements below a structural depth of four. Detailed documents may have numbered 
statements down to a depth of nine. A document that is well organized and maintains a 
consistent level of detail will have a pyramidal shape (few numbered statements at level 1 
and each lower level having more numbered statements than the level above it). 
Documents that have an hour-glass shape (many numbered statements at high levels, few 
at mid levels and many at lower levels) are usually those that contain a large amount of 
introductory and administrative information. Diamond shaped documents (a pyramid 
followed by decreasing statement counts at levels below the pyramid) indicate that 
subjects introduced at the higher levels are probably addressed at different levels of detail. 
• SPECIFICATION DEPTH is a count of the number of imperatives at each level of the 
document. These numbers also include the count of lower level list items that are 
introduced at a higher level by an imperative that is followed by a continuance. This 
structure has the same implications as the numbering structure. However, it is significant 
because it reflects the structure of the requirements as opposed to that of the document. 
Differences between the shape of the numbering and specification structure are an 
indication of the amount and location of background and/or introductory information is 
included in the document. The ratio of total for SPECIFICATION STRUCTURE to total 
lines of text is an indication of how concise the document is in specifying requirements. 

The application of this information is still under investigation, and initial results from Project 
X are interesting. Figure 3 depicts expected structure versus actual structure of the 
Specification and Design requirement documents. The project data suggests the Specification 
requirements may have been overly defined, therefore artificially constraining the design and 
its expansion. The structure of the imperative levels in the Design document reinforces this 
observation, indicating little expansion where extensive expansion is expected. 



SEW Proceedings 


340 


SEL-96-002 





3 . Requirement Metrics 

This section of the paper focuses on the application of metrics available through the use of a 
requirements management CASE tool. These metrics assist project managers and quality 
assurance engineers to identify the risks of insuring that the completed software system 
contains the functionality specified by the requirements. There are no published or industry 
standard guidelines for these metrics: intuitive interpretations, based on experience and 
supported by project feedback, are used in this paper. Project management has reacted 
favorably to the metrics and has used the analysis results to mitigate certain perceived risks. 
The SATC continues working on methods to mathematically validate the intuitive guidelines 
so that the requirement metrics and their interpretation are applicable to an ever increasing 
variety of software development applications. Three areas of requirement metrics will be 
discussed: Stability Over Time Per Requirement Design Level, Stability Over Time By Project 
Build, Expansion From Specification To Design Level. 

3 . 1 Requirement Stability Over Time per Requirement Design Level 


Requirements are developed and baselined at major reviews during the system development 
life cycle. At these milestone reviews, documents containing the requirements are reviewed 
and commented upon. After resolution of the comments, the requirement documents are 
baselined and put under configuration control. Ideally, the rate of change in each level of 
requirements should decrease as a milestone review approaches. Figure 4 shows the count of 
requirements at each level during the 6 month period starting at Preliminary Design Level 
(PDR) (through Critical Design Review (CDR) As expected, the Top Level and Specification 
requirements remaijned stable during tins six month period. The Intermediate Specification 
and Design Level documents both stabilized prior to CDR. 



Figure 4: Requirement Count by Document Level 
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3 .2 Requirement Stability Over Time By Project Build 


As stated earlier, this system is implemented in three builds with the Specification and Design 
requirements allocated to each of these builds. One of the purposes of a multiple build 
development effort is to minimize the implementation risk associated with any one build. This 
insures that no single build implements an inordinate number of requirements. Figure 5 shows 
the counts of the Design Requirements (Figure 4) for Build 1 and Build 2. 



Figure 5: Design Requirement Allocation by Build 

This is a different picture of the requirements stability, showing a shift in the number of 
requirements from Build 1 to Build 2; and indicating a potential risk to the schedule of Build 2 
which should be closely monitored. 

3.3 Requirement Detail Expansion 

In addition to requirement stability, the expansion of the upper level requirements to more 
detailed levels generates potential project risk. Figure 6 shows the number of requirements of 
the Detail level referencing the number of requirements in the Intermediate Specification 
Level. The tails on the expected curve in the upper right indicate a scattering of upper level 
requirements referenced either by very few or very many detailed requirements; however, the 
majority of requirements will have multiple references and result in a bell-shaped curve.. 
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Project data however, does not match the expected; there is a high number of Intermediate 
Specification requirements that are referenced by only one or very few Detail requirements 
(left hand side of the graph) while other requirements have high numbers of multiple 
references (right hand side of graph). As an example, the Intermediate Specification 
requirement “The system shall have a database.” is linked to 200 Design level requirements. 
The shape of the curve in Figure 6 indicates that the Design analysis is incomplete with the 
specific requirements are not adequately decomposed, thus suggesting that requirements were 
copied with neither analysis nor expansion into detail for the implementation phase. 

4. Conclusions 

Based on the work done to date, four conclusions can be reached: 

• Requirement metrics assist in identifying potential project risks 

• Multiple metrics are needed for comprehensive evaluation 

• Evaluation of requirement text can yield risk information very early in the life cycle 

• Metric collection is cheaper, faster and more reliable with requirement management tools 

Using automated tools to track requirements has opened the door to deriving metrics for 
characterizing requirement text , stability and expansion rate. Tracking and correlating test 
cases and test results to individual requirements within a database is essential for viewing 
relationships not otherwise available.. The use of an automated requirements database allows 
the metrics program to generate metrics for best insight into the requirements and test case 
interplay . The metrics presented in this paper are the result of much research into data use 
and pictorial display; however, all results have been used by project management to 
successfully identify and manage risks. 
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• Requirement metrics assist in identifying potential 
project risks 

• Multiple metrics are needed for comprehensive 
evaluation 

• Evaluation of requirement text can yield risk 
information very early in the life cycle 

• Metric collection is cheaper, faster and more reliable 
with requirement management tools 
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1 Introduction 

Studies have shown that the majority of errors in software systems are due to incorrect requirements 
specifications. The root cause of many requirements errors is the imprecision and ambiguity that 
arise because the software requirements are expressed in natural language. An effective way to 
reduce such errors is to express requirements in a formal notation. For a number of years, researchers 
at the Naval Research Laboratory (NRL) have been working on a formal method based on tables to 
specify the requirements of practical systems [2, 11]. Known as the Software Cost Reduction (SCR) 
method, this approach was originally formulated to document the requirements of the Operational 
Flight Program (OFP) for the U.S. Navy’s A-7 aircraft [2]. Since SCR’s introduction more than 
a decade ago, many industrial organizations, including Lockheed, Grumman, and Ontario Hydro, 
have used SCR to specify requirements. Recently, NRL has developed both a formal state machine 
model [12, 14] to define the SCR semantics and a set of software tools to support analysis and 
validation of SCR requirements specifications [10]. The tools support consistency and completeness 
checking, simulation, and model checking. 

To evalute the SCR method and toolset, we recently used SCR to produce a black box require- 
ments specification of a simplified mode control panel for the Boeing 737 autopilot. Beginning with 
the English language description of the system presented in [4], we represented the environmental 
quantities that the computer system monitors (e.g., the pilot switches, dials, and sensors) and 
the environmental quantities that the computer system controls (i.e., the individual displays) as 
monitored and controlled variables. We then used these variables and the SCR tabular notation to 
specify the requirements of the mode control panel. The heart of the specification is the relation 
REQ, the required relation between the monitored and controlled variables [20]. 

In this paper, we use the autopilot mode control panel as an example for comparing and con- 
trasting the SCR approach to requirements specification and analysis with the approach used in 
[4]. The latter approach uses the formal language of SRI’s Prototype Verification System (PVS) 
[17] to represent the requirements of the mode control panel and then applies the automated rea- 
soning provided by PVS to analyze the specification. Formulating the requirements specification 

*This work was supported by the Office of Naval Research. 
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for the mode control panel in SCR exposed a number of problems, including a missing input event, 
an incorrect assumption about the environment, and a misinterpretation of the prose description. 
We also discovered that because parts of the PVS specification are highly abstract, certain key 
aspects of the system’s requirements axe omitted. In contrast, the SCR approach makes explicit 
many important questions about the required behavior of the mode control panel. We conclude 
with a discussion of general issues such as the appropriate level of abstraction for documenting 
requirements, the choice of notation, the kinds of analyses that can be done on the specification, 
the relation between different kinds of analyses, and the role of tool support. Appendix B contains 
the complete SCR requirements specification of the mode control panel. 

2 Motivation and Background 

It is widely acknowledged that requirements are a major source of errors during the development of 
large software systems [1, 9, 16]. For example, studies by Lutz [16] have shown that functional and 
interface requirements were the source of a majority of safety-related software errors in NASA’s 
Voyager and Galileo spacecrafts. There is no doubt that getting a complete and consistent charac- 
terization of software requirements is inherently hard. However, there are failings in the software 
development process, including the requirements process, that can be rectified by improved practice 
[8]. A disciplined and rigorous approach to the analysis and specification of software requirements 
can address many difficulties that result from such failings. 

The goal of the requirements phase is to create a document, the Software Requirements Spec- 
ification (SRS), to precisely describe the problem to be solved and to accurately characterize the 
set of acceptable solutions to the problem. The effectiveness of the requirements phase is deter- 
mined by the extent to which the SRS is precise, unambiguous and consistent (i.e., its correctness), 
whether it captures all the results of the analysis (i.e., its completeness), and its useability. The 
useability criteria are ease of change (i.e., its modifiability), whether the notation is understandable 
both by customers as well as the developers (i.e., its readability), its organization for easy reference 
and review (for instance, one should quickly be able to find answers to specific questions about 
the requirements), and organization for ease of change. In addition, the underlying conceptual 
model and notation of the SRS should support formal analyses such as validation (to ensure that 
the specification describes the intended requirements), and verification (which establishes that the 
specification satisfies critical properties of interest). Finally, the method should provide guidelines 
that support decisions on organization and modification of the SRS. By sufficiently constraining 
the underlying semantic model, these guidelines ensure that the quality of the SRS does not depend 
too much on the level of expertise of its writer(s). 

2.1 The SCR Method 

Unlike traditional research on requirements, which concentrates on the requirements analysis pro- 
cess. , the focus of the SCR work at the Naval Research Laboratory is on issues that influence the 
creation and maintenance of the SRS. By identifying desirable properties of an SRS, the SCR 
project has developed a set of guidelines for writing the SRS [11, 8]. These guidelines include 
separation of concerns , information hiding, and the use of a readable yet formal notation. For 
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example, the guideline separation of concerns supports useability, modifiability, and verifiability of 
the SRS. Moreover, the notation supported by the SCR method is designed to be understandable 
both by customers as well as software developers. Underlying the notation is a mathematical model 
which supports completeness and consistency checking, validation, test case generation, and formal 
verification. 

To support the SCR method, NRL has developed a set of software tools for analysis and 
validation of SCR requirements specifications [10, 13]. The tools include a specification editor for 
creating and modifying the specifications, a simulator for symbolic execution, and tools for formal 
analysis. The latter include a consistency checker which uncovers application-independent errors 
such as syntax and type errors, missing cases, and unwanted nondeterminism, and a verifier which 
checks a specification for critical application-specific properties. 

2.2 PVS 

PVS (Prototype Verification System) [17] is an environment for specification and verification de- 
veloped at SRI International. The PVS system is built around a highly expressive specification 
language. The system has a number of predefined theories, and comes with a very effective in- 
teractive theorem prover in which most of the low-level proof steps are automated. The PVS 
specification language is based on higher-order logic with a richly expressive type system. The 
PVS prover consists of a powerful collection of inference steps which include arithmetic and equal- 
ity decision procedures, automatic rewriting, and boolean simiplification. PVS has been applied 
to a number of practical problems [4, 5, 21]. Many organizations, including NASA, have used the 
PVS specification language for documenting software requirements. 

3 Comparison of PVS with the SCR method 

In this section, we address some of the strengths and limitations of using PVS, and compare the 
PVS approach to the SCR method. We base our comparison on the assumption that a notation (and 
associated tools) should support the following process, which may be thought of as an idealization 
of a real-world process for requirements analysis [19]. 

1. SRS Creation: The results of problem analysis are captured in the SRS, using a formal 
notation. 

2. SRS Checking: The SRS is checked for proper syntax, type correctness, consistency, com- 
pleteness, and other application-independent properties, using an automated checker. 

3. SRS Validation: The goal of this phase is to ensure that the SRS captures the customers’ 
intent. This is achieved by symbolically executing the SRS using a simulator. 

4. SRS Verification: This phase verifies that certain crucial application specific properties, such 
as safety and security properties, hold for the SRS. Verification is carried out by using an 
interactive theorem prover or by “lightweight” analysis tools such as model checkers. 
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3.1 SRS Creation 


The choice of notation, and availability of guidelines to support decisions on SRS organization 
and modification, are factors which influence this phase. A simpler, more restrictive notation is 
preferable to a more powerful, expressive one. In addition to ease of use, a restricted semantic 
model can provide guidelines for creating and organizing the SRS. A well-designed notation will 
help even novices create good specifications. 

The PVS system is built around a highly expressive specification language. However, most 
developers, being unfamiliar with higher-order logic (the underlying formalism of the PVS spec- 
ification notation), lambda expressions, higher-order functions and quantification, etc, find the 
notation hard to use. It has also been our experience that the expressive power of higher-order 
logic is seldom required for requirements specification of most practical systems. The organizing 
unit for PVS specifications is the “Theory”. The PVS language lacks structures to support read- 
ability and ease of change. It is very hard for novices to create good PVS specifications. For 
example, it has been observed by Young [22] that the quality of specifications in PVS depends to 
a large extent on the expertise of the specification writer. 

The SCR method is suitable for embedded, real-time systems, i.e., for systems that sense and 
control quantities in their environment [20]. The SCR method includes a systematic approach 
for capturing requirements [11, 15, 6], and is based on a tabular notation which has a formal 
mathematical basis [12, 13, 14]. The SCR notation, having been tailored to a specific class of 
problems, sacrifices generality for ease of use and improved support for analysis. Most engineers 
find the tabular notation easy to use and understand. Also, tables afford a natural organization 
which permits independent construction, review, modification, and analysis of smaller parts of a 
large requirements specification. 

It has been observed that in comparison to graphical notations and (structured) text, tabular 
notations scale very well to large problems. According to Parnas, the specification of the shutdown 
system for the Darlington Nuclear Power Plant [18] weighed more than 20 kilograms on paper. In 
our own experience, we have come across examples of SCR requirements specifications for practical 
systems (e.g., the OFP for the C-130J aircraft [7]) containing more than a thousand tables. 

3.2 SRS Checking 

In addition to checks for incorrect syntax, the PVS language has a rich type system which supports 
rigorous typechecking. The type system of PVS is undecidable, which means that typechecking 
cannot be completely automated. In most situations, the PVS typechecker will generate proof 
obligations which have to be proved using the interactive prover. Such proofs amount to a very 
strong consistency check on some aspects of the specification. 

The consistency and completeness checker of the SCR toolset verifies application-independent 
properties derived automatically from the requirements model. These checks ensure that a specifi- 
cation is well-formed by identifying syntax and type errors, incompleteness, missing initial values, 
unreachable modes, and circular definitions. The tool also identifies missing cases and undesirable 
nondeterminism. All these checks are carried out automatically. 
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3.3 SRS Validation 


PVS does not support validation. 

The tabular notation of SCR supports validation by inspection and simulation. Most domain 
experts find this notation easy to read and review. For example, Parnas [18] observes that the 
utility of the tabular notation was evident during the formal review of the Darlington specification. 
During the review, each “case” and its associated subcases could be reviewed individually and inde- 
pendently of other “cases”. The tabular notation also forces one to consider all possible scenarios. 
Further, we show in [3] that theorems that are true of certain fragments of an SCR requirements 
specification also hold for the whole specification. 

The simulator in the SCR toolset performs symbolic execution of the underlying state machine 
model, which allows users to assess system behavior in specific “use cases” directly from the re- 
quirements specification. The simulator can expose problems — such as missing requirements and 
incorrectly stated requirements — that cannot be detected by verification techniques. 

3.4 SRS Verification 

Using PVS, one can establish, by interactive theorem proving, properties that are deemed to be 
true of a requirements specification. However, few practitioners have the mathematical sophistica- 
tion required to carry out such proofs. The state-of-the-art theorem prover of PVS does ameliorate 
the problem by including powerful decision procedures that automate parts of a proof that would 
otherwise require user guidance. Very often, a property will not hold for a requirements specifi- 
cation. In such a case, either the formulation of the property is incorrect, or the specification is 
wrong (or both). Proper diagnosis and user feedback are therefore very important to help correct 
the problem. Theorem provers provide very little help in such situations because theorem proving 
is incomplete; i.e., if one is unable to prove a theorem using a theorem prover, then all one can 
conclude is that the theorem prover failed to find a proof (the theorem may be true). On the other 
hand, methods such as model checking axe complete — if a model checker reports that a theorem 
is false, it is false. Additionally, most model checkers will provide a counterexample that falsifies 
the theorem. PVS does support model checking for a limited subset of the language, but provides 
no counterexample. 

The SCR toolset supports proof of safety properties of a requirements specification using state 
exploration based model checking [3]. One of the main design goals of our toolset is to provide 
proper error diagnosis by generating understandable counterexamples for user feedback. Future 
plans include support for other forms of model checking and automatic theorem proving. Since the 
underlying model of the SCR notation is a state machine, several other verification activities can be 
supported. For instance, we plan to automatically generate test-cases from an SCR specification, 
to assist in black-box testing of implementations. In certain limited contexts, it should also be 
possible to automatically generate code directly from an SCR requirements specification. 

4 The Autopilot Requirements Specification 

To illustrate the SCR method, we consider a simplified mode control panel for the Boeing 737 
autopilot as discussed in [4]. The mode control panel for the autopilot is shown in Figure 1. 
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Figure 1: Mode Control Panel 


SEW Proceedings 


358 


SEL-96-002 



The system monitors the aircraft’s altitude (ALT), flight path angle (FPA) and calibrated air 
speed (CAS). The panel includes three displays which show the current values for altitude, flight 
path angle, and airspeed of the aircraft. The pilot may enter a new value into a display by “dialing- 
in” the value using one of three knobs next to the displays. The pilot engages or disengages the 
autopilot by pressing one of four buttons on the panel. Appendix A contains a description of 
the system in English prose (adapted from [4]). Below, we informally present the steps taken to 
document the requirements using the SCR notation. 

In SCR, the required system behavior is described by REQ, the required relation between 
monitored variables, environmental quantities that the system monitors, and controlled variables, 
environmental quantities that the system controls [20]. To specify this relation concisely, the SCR 
approach uses four constructs - modes, terms, conditions, and events. A mode class is a variable 
whose values are system modes (or simply modes), while a term is any function of monitored 
variables, modes, or other terms. A condition is a predicate defined on one or more system entities 
(an entity is a monitored or controlled variable, mode class, or term). An event occurs when the 
value of any system entity changes. The notation “®T(c) WHEN d” denotes a conditioned event, 
defined as 

®T(c) WHEN d = f -.c A c' A d, 

where the unprimed condition c is evaluated in the “old” state, and the primed condition c' is 
evaluated in the “new” state. The notation “®F(c)” denotes the event ®T(N0T c). The environ- 
ment may change a monitored quantity, causing an input event. In response, the system changes 
controlled quantities and updates terms and mode classes. 

We begin by identifying the monitored quantities, i.e., the environmental quantities that the 
autopilot system monitors, and denote them by corresponding monitored variables. We use the 
prefix “m” for all monitored variable names. Each monitored variable is of a certain type, which 
specifies the range of values that may be assigned to that variable. The autopilot system moni- 
tors the actual altitude (denoted by monitored variable mALTactual), the actual flight path angle 
(mFPAactual), and the actual calibrated air speed (mCASactual). We assume these variables to 
range over the integers. Switches ALTsw, ATTsw, CASsw, and FPAsw are denoted respectively by 
mALTsw, mATTsw, mCASsw, and mFPAsw. These monitored variables may take on one of the values 
from the set {on, off}. Finally, knobs ALTdesired, CASdesired, and FPAdesired are denoted by 
monitored variables mALTdesired, mCASdesired, and mFPAdesired respectively, which range over 
the integers. 

We then identify the controlled quantities, i.e., the environmental quantities that the autopilot 
system controls, and denote them by corresponding controlled variables. We use the prefix “c” for 
all controlled variable names. Just as for monitored variables, we assign a type to each controlled 
variable. For simplicity of exposition we shall, as in [4], only model the mode-control panel itself, 
and not the commands that will be sent out to the flight-control computer. The three controlled 
quantities of the mode control panel are ALTdisplay, FPAdisplay, and CASdisplay, which we 
denote respectively by cALTdisplay, cFPAdisplay, and cCASdisplay. We assume these values to 
range over the integers. 

We model the primary modes of the mode-control panel by the modeclass Status, denoted by 
variable mcStatus. The variable can take on any value in the set {ALTmode, ATTmode,FPAmode}. 
The altitude engaged mode being “armed” is denoted by a boolean term variable tARMED (we use 
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the prefix “t” for terms). If t ARMED is true , then mcStatus should be FPAmode. The previous 
sentence is an example of a property of the specification which we may later want to prove. We 
also define a boolean valued term tCASmode, to model the system being in the calibrated air speed 
mode. By describing the status of the mode-control panel in this manner, we have ensured that 
the following sentences in the prose requirements are trivially satisfied: 

1. Only one of the three modes ALTmode, ATTmode, or FPAmode can be engaged at any time. 

2. One of the three modes, ATTmode, FPAmode, or ALTmode should be engaged at all times. 

3. Engaging any of the three modes will automatically cause the other two to be disengaged since 
only one of these three modes can be engaged at a time. 

4. The mode CASmode can be engaged at the same time as any of the other modes. 

We define three boolean valued terms tALTpresel, tCASpresel, and tFPApresel to denote 
whether the corresponding quantity has been pre-selected by dialing in a new value using one of 
the three knobs. Finally, we define a boolean term tNear to denote the predicate mALTdesired — 
mALTactual < 1200. 

The behavior of mode class mcStatus is specified in a mode transition table. In the following, 
the expression CHANGED (x) denotes the event “variable x has changed”. The table defines all events 
that change the value of the mode class mcStatus. For example, the first row of the table states, “If 
mcStatus is ALTmode, and mATTsw is switched on, or the setting of knob mALTdesired is changed, 
then mcStatus changes to ATTmode.” Events that do not change the value of the mode class are 
omitted from the table. 


Source Mode 

Events 

Destination Mode 

ALTmode 

®T (mATTsw = on) OR CHANGED (mALTdesired) 

ATTmode 

ALTmode 

©T(mFPAsw = on) 

FPAmode 

ATTmode 

®T(mALTsw = on) WHEN (tALTpresel AND tNear) 

ALTmode 

ATTmode 

®T (mFPAsw = on) OR «T(mALTsw = on) WHEN 
(tALTpresel AND NOT tNear) 

FPAmode 

FPAmode 

®T(mALTsw = on) WHEN (tALTpresel AND tNear) OR 
®T(tNear) WHEN tARMED 

ALTmode 

FPAmode 

®T(mATTsw = on) OR ®T(mFPAsw = on) OR 
CHANGED (mALTdesired) WHEN tARMED 

ATTmode 


Each row in the mode transition table above corresponds to certain sentences in the prose 
requirements. We describe this correspondence below. Here, “paragraph x” refers to the numbered 
paragraph x of the prose requirements in Appendix A. 

Row 1 . The pilot engages a mode by pressing the corresponding button on the panel (paragraph 1) i.e., 
pressing ATTsw should engage ATTmode OR If the pilot dials in a new altitude while ALTmode 
is engaged, then ALTmode is disengaged and ATTmode is engaged (paragraph 7). 

Row 2. The pilot engages a mode by pressing the corresponding button on the panel (paragraph 1) i.e., 
by pressing FPAsw the pilot engages FPAmode. 
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Row 3. The pilot engages a mode by pressing the corresponding button on the panel (paragraph 1) i.e., 
pressing ALTsw engages ALTmode. However, the altitude must be pre-selected before ALTsw is 
pressed (paragraph 4). If the pilot dials an altitude that is more than 1, 200 feet above ALTactual 
and then presses ALTsw, then ALTmode will not directly engage (paragraph 3). 

Row 4, The pilot engages a mode by pressing the corresponding button on the panel (paragraph 1) i.e., 
by pressing FPAsw the pilot engages FPAmode OR If the pilot dials into ALTdesired an altitude 
that is more than 1,200 feet above ALTactual and then presses ALTsw, then ALTmode will not 
directly engage. Instead, the altitude engage mode will change to “armed” and FPAmode is 
engaged (paragraph 3). 

Row 5. The situation described for row (3) above OR Instead, the altitude engage mode will change to 
“armed” and FPAmode is engaged. [. . .] FPAmode will remain engaged until the aircraft is within 
1,200 feet of ALTactual, then ALTmode is automatically engaged (paragraph 3). 

Row 6. The pilot engages a mode by pressing the corresponding button on the panel (paragraph 1) i.e., 
by pressing mATTsw the system enters ATTmode OR FPAsw toggles on and off every time it is 
pressed, (paragraph 5) OR If the pilot dials in a new altitude while the altitude engage mode is 
“armed” then ATTmode is engaged. [. . .] FPAmode should be disengaged as well, (paragraph 7). 

The behavior of term tARMED is specified in the event table below. Like mode transition tables, 
event tables make explicit only those events that cause the variable defined by the table to change. 
For example, the first entry in the first row states, “If mcStatus is ATTmode or FPAmode and mALTsw 
is turned on when tALTpresel is true and tNear is false, then tARMED becomes true.” The entry 
“NEVER” in an event table means that no event can cause the variable defined by the table to 
assume the value in the same column as the entry; thus, the entry “NEVER” in row 2 of the table 
means that when mcStatus is ALTmode no event can cause tARMED to become true. An entry 
“QT (Inmode)” in a row of a mode transition table or an event table denotes the event “system 
entered the corresponding mode”. 


Hodes 

Events 

ATTmode , 
FPAmode 

QT (mALTsw = on) VEEN (tALTpresel 
AND NOT tNear) 

OF (mcStatus = FPAmode) 

ALTmode 

NEVER 

OF (mcStatus = FPAmode) 

tARMED = 

true 

false 


We finally present the behavior of the display cCASdisplay using the condition table below. This 
table states that “If tCASpresel is true then cCASdisplay has the value mCASdesired; otherwise, 
it has the value mCASactual”. The complete autopilot specification is in Appendix B. 


cCASdisplay = 


Conditions 

tCASpresel 

NOT tCASpresel 

mCASdesired 

mCASactual 
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5 Discussion of General Issues 


In [3] we present a verification technique for proving properties of SCR requirements specifications. 
This technique proved to be valuable in detecting and correcting bugs in the autopilot specification. 
For example, an initial formulation of the specification violated the property “ the altitude engage 
mode will be ARMED only when the flight path angle select mode is engaged!’’ . The counterexample 
generated by the tool helped diagnose the error (we were setting t ARMED to true when mcStatus is 
ALTmode, and mALTsw is turned on when tALTpresel is true and tNear is false). 

We found that the PVS model does not clearly distinguish a system’s environmental quantities 
from the dependent quantities. Also, by not clearly identifying environmental quantities the system 
monitors, and environmental quantities the system controls, it was very hard to find an answer to 
the question “What is the required behavior of the system?” by examining the PVS model. During 
the process of creating the SCR requirements specification, we came up with several questions for 
which we could not find answers from the PVS model. This is because the PVS description is not 
at the appropriate level of abstraction. 

5.1 Appropriate Level of Abstraction 

The PVS model of the autopilot in [4] is too abstract to serve as a requirements specification, i.e., 
as a black box description of all acceptable system implementations. Rather than specifying the 
required relationship between environmental quantities of the autopilot mode control panel, the 
PVS description is an abstract model of the mode control panel. Therefore, it is not a require- 
ments specification. For example, the monitored quantity ALTactual is denoted abstractly by two 
boolean variables alt-xeached and alt_gets mear; boolean variable input_alt abstractly denotes 
the pilot “dialing-in” the desired altitude using knob ALTdesired; etc. It is usual to make such ab- 
stractions during verification, because existing methods cannot be directly applied to requirements 
specifications, which are too detailed. However, the right approach is to begin by formulating the 
requirements specification, and later to describe formally the relationship between the specification 
and the abstract verification models. If the correspondence between the abstract models and the 
requirements specification is informal (or if the requirements specification is never created), it leaves 
room for misinterpretation. 

5.2 Kinds of Analyses 

In our experience, the first three phases of our idealized process for requirements analysis, viz., SRS 
Creation, SRS Checking, and SRS Validation, are the most crucial ones. It is very likely that a 
large proportion of activities of requirements analysis will be in support of these phases. It is also 
safe to assume that for a majority of projects (barring a small number of projects developing safety 
or mission critical applications) the last phase, i.e., SRS Verification, will be completely skipped. 
Since PVS concentrates exclusively on this phase of analysis, and provides poor support for the 
initial three phases, it is unlikely to be very effective as a tool to support requirements analysis. 
However, PVS has been effective in the analysis of critical algorithms and architectures for fault- 
tolerance, such as the correctness of distributed agreement protocols for a hybrid fault model, and 
in the verification of crucial subsystems, such as a commercial avionics microprocessor. 
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5.3 Role of Tool Support 

In our experience, tools that support a limited analysis domain, with a specific conceptual model, 
tend to be more effective than general purpose tools. If a method lacks a strong underlying con- 
ceptual model, the benefits of automation are likely to be minimal ([8] provides more details). If a 
method does not adequately constrain the problem, the corresponding support tools cannot guide 
the developer when making difficult decisions. Since the SCR method standardizes the problem 
domain, the conceptual model, the notation, and the process, significant automated tool support 
is possible. For example, by using information about the current state of a specification, and 
knowledge of the process, a tool can guide developers in making the next step. Also, by providing 
standard templates, a tool can automate the routine activities of SRS creation. By applying the 
SCR method to several industrial problems, we plan to exploit the full potential of such tools. 
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A Description of the autopilot 

1. The mode-control panel contains four buttons for selecting modes and three displays for dialing 
in or displaying values, as shown in Figure 1. The system supports the following four modes: 
attitude control wheel steering (ATTmode), flight path angle selected (PPAmode ), altitude engage 
(ALTmode ), and calibrated air speed (CASmode ). 

Only one of the first three modes can be engaged at any time. The mode CASmode can be 
engaged at the same time as any of the other modes. The pilot engages a mode by pressing 
the corresponding button on the panel. One of the three modes, ATTmode, FPAmode, or ALTmode 
should be engaged at all times. Engaging any of the first three modes will automatically cause 
the other two to be disengaged since only one of these three modes can be engaged at a time. 

2. There are three displays on the panel: altitude (ALTdisplay,), flight path angle (FPAdisplay ), 
and calibrated air speed (CASdisplay^. The displays usually show the current values of altitude 
(ALTactual ), flight path angle (FPAactual ), and air speed (CAS actual ) of the aircraft. How- 
ever, the pilot can enter a new value into a display by dialing in the value using the knob next 
to the display (ALTdesired, FPAdesired, or CASdesiredj. This is the target or “ pre-selected ” 
value that the pilot wishes the aircraft to attain. For example, if the pilot wishes to climb to 
25,000 feet, he will dial 25,000 (using the knob ALTdesiredJ into ALTdisplay and then press 
ALTsw to engage ALTmode. Once the target value is achieved or the mode is disengaged, the 
display reverts to showing the “current” value. 

3. If the pilot dials into ALTdesired an altitude that is more than 1,200 feet above the current 
altitude (ALTactual ) and then presses ALTsw, then ALTmode will not directly engage. Instead, 
the altitude engage mode will change to “armed” and FPAmode is engaged. The pilot must then 
dial in, using the knob FPAdesired, the desired flight-path angle into FPAdisplay, which will 
be followed by the flight-control system until the aircraft attains the desired altitude. FPAmode 
will remain engaged until the aircraft is within 1,200 feet of ALTactual, then ALTmode is auto- 
matically engaged. 

4- CASdesired and FPAdesired need not be pre-selected before the corresponding modes are en- 
gaged — the current values displayed will be used. The pilot can dial-in a different target value 
after the mode is engaged. However, the altitude must be pre-selected before ALTsw is pressed. 
Otherwise, the command is ignored. 

5. CASsw and FPAsw toggle on and off every time they are pressed. For example, if CASsw is 
pressed while the system is already in CASmode, that mode will be disengaged. However, if 
ATTsw is pressed while ATTmode is already engaged, the command is ignored. Likewise, pressing 
ALTsw while the system is already in ALTmode has no effect. 

6. Whenever a mode other than CASmode is engaged, all other pre-selected displays should return 
to current. 

7. If the pilot dials in a new altitude while ALTmode is engaged or the altitude engage mode is 
“armed”, then ALTmode is disengaged and ATTmode is engaged. If the altitude engage mode is 
“armed” then FPAmode should be disengaged as well. 
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B SCR Specification of the autopilot 


Monitored Variables: 

mALTactual, mCASactual, mFPAactual : Integer initially all 0; 
mALTsw, mATTsw, mCASsw, mFPAsw : {on, off} initially all off; 
mALTdesired, mCASdesired, mFPAdesired : Integer initially all 0; 

Controlled Variables: 

cALTdisplay, cCASdisplay, cFPAdisplay : Integer initially all 0; 
Mode Class: 

mcStatus : {ALTmode, ATTmode, FPAmode} initially ATTmode; 

Terms: 

tARMED : Boolean initially false; 
tCASmode : Boolean initially false; 

tALTpresel, tCASpresel, tFPApresel : Boolean initially all false; 
tNear *= f mALTdesired — mALTactual < 1200; 




Figure 2: Variable Dependency Graph 


Mode Transition Table for mcStatus 

Source Mode 

Events 

Destination Mode 

ALTmode 

®T (mATTsw = on) OR CHAN GED (mALTde sired) 

ATTmode 

ALTmode 

QT (mFPAsw = on) 

FPAmode 

ATTmode 

<8T(mALTsw * on) WHEN (tALTpresel AND tNear) 

ALTmode 

ATTmode 

®T (mFPAsw = on) OR ®T (mALTsw = on) WHEN 
(tALTpresel AND NOT tNear) 

FPAmode 

FPAmode 

®T (mALTsw = on) WHEN (tALTpresel AND tNear) OR 
®T (tNear) WHEN tARMED 

ALTmode 

FPAmode 

®T (mATTsw = on) OR ®T (mFPAsw = on) OR 
CHANGED (mALTdesired) WHEN tARMED 

ATTmode 
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Modes 

Events 

ATTmode, FPAmode 

®T (mALTsw = on) WHEN 
(tALTpresel AND NOT tNear) 

OF (me St at us = FPAmode) 

ALTmode 

NEVER 

OF(mcStatus = FPAmode) 

tAEMED = 

true 

false 


Events 

©T(mCASsw = on) WHEN HOT 
tCASmode 

«T(mCASsw = on) WHEN 
tCASmode 

true 

false 


Modes 

Events 

ALTmode 

NEVER 

©T (mALTdesired = 
mALTactual) OR ®F( INMODE) 

FPAmode 

CHANGED (mALTdesired) WHEN 
NOT tARMED 

NEVER 

ATTmode 

CHANGED (mALTdesired) 

©T( INMODE) OR «T(mFPAsw = 
on) 

tALTpresel = 

true 

fsO.se 


tCASpresel = 


Events 

CHANGED (mCASdesired) 

©F(tCASmode) OR ©T (mCASdesired = 
mCASactnal) WHEN tCASmode 

true 

false 


tFPApresel * 


Events 

CHANGED (mFPAdes ired) 

OT(mcStatus = ATTmode) OR 
GT(mcStatus = ALTmode) OR 
<5T (mFPAdes ired = mFPAactual) WHEN 
(mcStatus * FPAmode) 

true 

false 


cALTdisplay 


Conditions 

tALTpresel 

NOT tALTpresel 

mALTdesired 

mALTactual 


cCASdisplay 


Conditions 

tCASpresel 

NOT tCASpresel 

mCASdesired 

mCASactnal 


cFPAdisplay 


Conditions 

tFPApresel 

NOT tFPApresel 

mFPAdesired 

mFPAactual 
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APPLYING THE SCR REQUIREMENTS 
SPECIFICATION METHOD TO PRACTICAL 
SYSTEMS: A CASE STUDY 


Ramesh Bharadwaj and Connie Heitmeyer 

Center for High Assurance Computer Systems 
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Washington, DC 20375 

December 5, 1996 






/fhe SCR Method: A Case Study 


The NRL SCR project! 


Initial goal: Document requirements of the Operational Flight 
Program (OFP) for the US Navy’s A-7 aircraft. 

Recent work: 

• Formal state machine model for the SCR notation 

• Support tools for analysis and validation of SCR specifications 

• Application to practical systems 
- Lockheed: C-130J OFP 

— US Navy: Torpedo Control Panel for new attack submarine 

Vg&mesh Bharadwaj and Connie Heitmeyer J 
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'*rhe SCR Method: A Case Study ^ 

I Motivation I 

Effectiveness of the Software Requirements Specification (SRS) 
depends on: 

• Precision 

• Correctness: Satisfies critical properties 

• Consistency: Parts are not contradictory 

• Completeness: Captures all required behavior 

• No Implementation Bias 

• Useability: 

- Modifiability : Ease of change 

- Readability: Customers as well as developers 

- Organization: Reference, review, answers to questions 

• Scalability 

\Ramesh Bharadwaj and Connie Heitmeyer y 
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"N 


fPVSl 

Prototype Verification System from SRI International. 

• Expressive specification language (based on Higher-Order Logic) 

• Built-in and user-defined theories and strategies 

• Interactive theorem prover 

- Automation of low-level proof steps 

- Powerful decision procedures 

- Automatic rewriting 

- Boolean similification 

Is PVS an effective tool for requirements specification and analysis? 


Vttamesh Bharadwaj and Connie Heitmeyer 
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/fhe SCR Method: A Case Study 


Idealized Requirements Analysis Process! 


1. SRS Creation - Capturing requirements in a formal notation. 

2. SRS C&C Checking - Syntax, type, missing cases, unwanted 
nondeterminism, circular definitions. 

3. SRS Validation - Inspection, simulation. 

4. SRS Verification - Theorem proving or model checking. 


\ftamesh Bharadwaj and Connie Heitmeyer 
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Phase ||PVS 

SCR 

SRS Creation 

Guidelines 

None 

SCR Method 

Notation 

Higher-Order Logic 

First-Order Logic 

Organization 

Theory 

Tables 

Scalability 

Low 

High 

SRS Checking 

Syntax and type 

Semi-automatic 

Automatic 

Consistency 

Typechecking 

C&C Checks 


SRS Validation 


Inspection 

Simulation 

Little support 
Not supported 

Tables ease review 
Symbolic execution 

SRS Verification 

Checking properties 
User feedback 
Test case generation 
Code generation 

Theorem proving 

None 

No 

No 

Model Checking 
Counterexample 
Yes 

Possible 


Jtamesh Bharadwaj and Connie Heitmeyer 
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/f he SCR Method: A Ca se Study 


So I would aigue that if anything, we should be looking for ways to 
make PVS more readable for specific problem domains. [. . .] I’d 
rather see scarce resources going towards greater readability. 

Steven P. Miller, Rockwell International. 

If the primary intended users of PVS are logicians and 
mathematicians, then keeping the current syntax [. . .] is a 
reasonable approach. If the primary intended users of PVS are 
practicing engineers, then neither the current syntax nor a 
LISP-like one makes any sense. 

C. Michael Holloway, NASA Langley. 
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/The SCR Method: A Case Study 


The SCR Approach to Requirements! 


• Identify the system outputs (controlled variables) 

• Determine the system inputs (monitored variables) 

• Define auxiliary variables (mode classes and terms) 

• Specify ideal system behavior (functions defined by tables) 

• Specify acceptable system behavior (timing and accuracy) 

\Ramesh Bharadwaj and Connie Heitmeyer 
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/The SCR Method: A Case Study 


ATTsw 


CASsw 


FPAsw 


ALTsw 




Vgamesh Bhaxadwaj and Connie Heitmeyer 




( ' 4 The SCR Method: A Case Study ^ 

IMonitored Variables! 

mALTactual, mCASactual, mFPAactual : Integer; 
mALTsw, mATTsw, mCASsw, mFPAsw : {on, o//}; 
mALTdesired, mCASdesired, mFPAdesired : Integer; 

1 Controlled Variables 1 

cALTdisplay, cCASdisplay, cFPAdisplay : Integer; 

IMode Classl 

mcStatus : { ALTmode,ATTmode,FPAmode }; 

PTermsIl 

tArmed : Boolean; 
tCASmode : Boolean; 

tALTpresel, tCASpresel, tFPApresel : Boolean; 
tHear : Boolean; 

VRamesh Bharadwaj and Connie Heitmeyer y 
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"\ 


Mode Transition Table for mcStatus 

Source Mode 

Events 

Dest . Mode 

ALTmode 

<§T(mATTsw = on) DR 
CHANGED(mALTdesired) 

ATTmode 

ALTmode 

@T (mFPAsw = on) 

FPAmode 

ATTmode 

<5T (mALTsw = on) WHEN 
(tALTpresel AND tNear) 

ALTmode 

ATTmode 

@T (mFPAsw = on) OR @T (mALTsw 
= on) WHEN (tALTpresel AND 
NOT tNear) 

FPAmode 

FPAmode 

@T (mALTsw ■ on) WHEN 
(tALTpresel AND tNear) OR 
@T (tNear) WHEN t ARMED 

ALTmode 

FPAmode 

<2T (mATTsw = on) OR @T (mFPAsw 
= on) OR CHANGED (mALTdesired) 
WHEN t ARMED 

i ATTmode 
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/fhe SCR Method: A Case Study 




Mode Transition Table for mcStatus 

Source Mode 

Events 

Destination Mode 

ALTmode 

®T(mATTsw = on) OR 
CHANGED (mALTdesired) 

ATTmode 


• The pilot engages a mode by pressing the corresponding button 
on the panel (paragraph 1) i.e., pressing ATTsw should engage 
ATTmode OR If the pilot dials in a new altitude while ALTmode 
is engaged, then ALTmode is disengaged and ATTmode is engaged 
(paragraph 7). 


\Raroesh Bbaradwaj and Connie Heitmeyer 
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/The SCR Method: A Case Study 


Summary] 


• The PVS language and prover axe designed for defining a 
mathematical model and reasoning about its properties 

• The SCR notation is a language for system requirements 

- E.g., in a PVS specification, one cannot distinguish system 
inputs and outputs from dependent variables 

- Given a PVS specification, one cannot answer the question, 
“What is the required behavior of the system?” 


\Ramesh Bharadwaj and Connie Heitmeyer 
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Panel Discussion: Transferring Best Practices: Why Is It So 
Complex? 

Moderator: Vic Basili, University of Maryland 


Richard DeMillo, Bellcore 

Michael Evangelist, Florida International University 
Peter Freeman, Georgia Institute of Technology 
Allan Willey, Motorola Corporation 
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SEL 21 Panel 

Transferring Best Practices: Why is it so hard? 


Panelists: 

Richard DeMillo 
Michael Evangelist 
Peter Freeman 
Allan Wiiley 


Bellcore 

Florida International University 

Georgia Tech 

Motorola 


Premise: 

Transferring any technology is very hard. In fact it has been harder than 
most people and organizations believe. For this reason, many organizations 
are unwilling to admit how unsuccessful they have been in transferring or 
sustaining best practices. We would like the panel to react to this 
premise. 


SEL 21 Panel 

Transferring Best Practices: Why is it so hard? 


Each Panelist was asked to: 

Give your background and experience with technology transfer. 

Give one or two specific examples of transfer projects you have observed or 

participated in: 

what procedures were followed in the transfer, 

what organizations were involved, 

what were the mayor problems, 

what was the cost in time and schedule, 

what were the results, 

what was the reaction of the participants, 

what aspects can you demonstrate was successful, 

what would you do cifferentiy now? 

What can you share with the audience in terms of lessons learned? 
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SEL Panel 

5 December 1996 
NASA/Goddard 
Richard DeMillo, Bellcore 


Contents 

• Bellcore background 

• Examples of Successful Transfer 

- Adapt/XAdvertiser 

- xATAC 

- Programmability 

• Web speed and change 

• Carddiagram 

• Team vs Transfer 

• Adapt/XModel 
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Bellcore background 

• Divestiture and Sale 

• Core customers and new customers 

• Product Lines and Development Processes 

• ISO and CMM 


Successful Transfer: Advertiser 

• Describe and Market 

• SS/AR team formation 

• AR led with business case 

- advertising would be key 

- scaleability 

- right commercial model 
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Successful Transfer: xATAC 


• AR advocate— top to bottom 

• Did not start from coverage— backing into it 

• Team formation— fear as the motivator 

• AR led with business case 

- third party validation 

- cost-benefit 

- scaleability 


Successful Transfer: 
Progranmiability 

• Long-term research on declarative 
optimization 

• Competitive opportunity 

• LAURE was on the shelf 

• Scaleable technology 
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Web Speed and Change 


• 18 mos to 4 mos development cycles 
•75% solutions that can evolve quickly 

• Platforms and features 

• Version 1 .0 is part of requirements 
definition 

• Ascendancy of architecture (eg availability, 
scaleability 

• RAD and Card diagram 


Team vs. Transfer 

• No time for transfer 

• Investment in inventory 

• Radar Screens and accountability 

• Impact of RAD 

• Adapt/XModel 
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Product vs. Process 


• P&L managers are easier to influence 

• Institutional change not needed 

• Adapt/XModel is a result of transfer 
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Transferring Research 
Technology 


Michael Evangelist 
School of Computer Science 
Florida International University 
Miami, FLA 


Personal Observation 
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When Do / Transfer Technology? 


• Need to believe that the new technology 

- solves a problem 

- works 

- fits 

- has low “cost” 


Three Transfer Examples 

• VERDI, 1985-90 (research prototype) 

• BAL/SRW, 1990-92 (advanced development) 

• PC networking application, December 1996 (product) 
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VERDI 


• Graphical tool for designing distributed systems 

- does all the right things, simply 

• We did it the right way 

• Result : lots of interest, no serious use 

- not commercial quality 

- solved part of the problem 

- didn’t fit environment or culture 

- platform and training costs high 


BAL/SR W 

• Workbench for re-engineering legacy BAL programs 

- useful, graphical, status quo 

• We worked closely with users 

• Result : substantial use at a few clients 

- much closer to product level 

- solved urgent problem 

- fit culture but not computing environment 

- hardware cost high 
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PC Networking Application 


• Establishes PPP connection over phone line 

• Numerous hard-to-find bugs, poor technical help 

• Result : no limit on the amount of effort I’ll put into it 

- doesn’t work, but we’re optimistic 

- solves important problem 

- fits system and culture 

- low long-term cost 


Observations 

• Motivation of researchers now less of a problem 

• Education of software engineers a serious concern 

• TT model 

- working engineers educated in standard practice 

- research preps engineering years in advance 

- if you’re not inventing sliced bread, resign yourself 
to incremental transfers 
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<g> 


MOTOROLA < 

Cellular Infrastructure Group 


Motorola Wireless Network Solutions 
Group (WNSG) 

• Approximately 2,400 in the R & D Group 

• Eight locations (today): 

• Arlington Heights (Chicago), IL 

- Scottsdale (Phoenix), AZ 

- Ft. Worth, TX 

- Cork, Ireland 

- Tel Aviv, Israel 

- Singapore 

- Osaka, Japan 

- Bejing, China 
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V — Cellular Infrastructure Group 

WNSG Products 


• Cellular Telephone Switches 

- 20+ million LOC 

• Base Stations for Radio-telephony 

- 300 KLOC to 500 KLOC 

• “Intelligent Network” products 

- from 35 KLOC to 500 KLOC 
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Cellular infrastructure Group 





A Real-life Experience 
Software Inspections 

• Adopted Fagan Inspection Process in mid-’92 

- Many escaped defects 

- Extensive repair costs j 

- Dissatisfied customers 

• WNSG GM Sponsored Effort 

• Hired Dr. Michael Fagan to train ALL engineers 

- Schedule relief offered to managers 

• Set up special-purpose inspection rooms | 

• Added training and coverage goals to bonuses 

- Provided mechanisms for data collection 

s / 
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(W) MOTOROLA 

^ Cellular Infrastructure Group 

Summary of Results 

• Benefits realized in first release cycle 

• Spectacular overall 10X reduction in 
customer-found defects 

• Measured improvements in: 

- productivity, 

- on-time delivery, and 

- customer satisfaction 


V J 
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Cellular Infrastructure Group 



■\ 


Adopter Categorization* 





•Rogers, Everett M.(I983), Diffusion of Innovation, The fVee Press, New York, p. 247. 


J 
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N ' ^ Cellular Infrastructure Group 

“Early Adopters” 

• SEI CMM Level 2+ 

• Dissatisfied customers, thus perceived 
need for change 

• Mid-level manager buy-in to Fagan 
inspections 

• Committed staff to address 
implementation issues 

• Collected and shared metrics from the 
start 

s J 
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(M) MOTOROLA s 

Cellular Infrastructure Group 

“Laggards” 

• SEI CMM Level 1 

• Developing a new product with no 
deliveries, thus no sense of urgency 

• Little mid-level management buy-in to 
Fagan practices 

• No initial metrics tracking 

• No performance audits 

• Claimed not seeing forecasted results 

s. J 
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Cellular Infrastructure Group 

Confirmations of 
Conventional Wisdom 

• Senior management sponsorship is 
needed, but that’s not enough. 

• New technologies diffuse best where 
there is a sense of urgency. 

• Receiving organizations must provide 
resource support to assure success. 


^ S 
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(W) MOTOROLA 

^ Cellular Infrastructure Group 

Transferring Best Practices is 
Complex Because ••• 

• More than 70% of the U.S. software 
industry is Level 1. 

• Most Technology Transition efforts are 
themselves carried out by Level 1 
organizations. 

• Therefore, Technology Transition today is 
done in an immature manner in 
immature organizations... ad hoc , 
chaotic , non-repeatable, high-risk, 
unmeasured, uncontrolled, etc., etc... 


(Q) MOTOROLA 

X Cellular Infrastructure Group 

Lessons Learned 

• Immature receiving organizations 
present higher risks and more barriers to 
change. 

• The Technology Transition process can be 
immature itself. 

• The Technology Transition process has to 
be tailored to the receiving organization. 
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